115
1

Project Report on Hepatitis Virus

Embed Size (px)

DESCRIPTION

This report help in knowledge about Hepatis B Virus.

Citation preview

Page 1: Project Report on Hepatitis Virus

1

Introduction

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins Bioinformatics is limited to sequence

structural and functional analysis of genes and genomes and their corresponding

products and is often considered computational molecular biology It consists of

two subfields the development of computational tools and databases and the application

of these tools and databases in generating biological knowledge to better understand

living systems These tools are used in three areas of genomic and molecular biological

research molecular sequence analysis molecular structural analysis and molecular

functional analysis The areas of sequence analysis include sequence alignment sequence

database searching motif and pattern discovery gene and promoter finding

reconstruction of evolutionary relationships and genome assembly and comparison

Structural analyses include protein and nucleic acid structure analysis comparison

Classification and prediction The functional analysis includes gene expression profiling

protein- protein interaction prediction protein sub cellular localization prediction

metabolic pathway reconstruction and simulation The three aspects of bioinformatics

analysis are not isolated but often interact to produce integrated results For example

protein structure prediction depends on sequence alignment data clustering of gene

expression profiles requires the use of phylogenetic tree construction methods derived

In sequence analysis Sequence- based prediction is related functional analysis of co

expressed genes The first major bioinformatics project was undertaken by Margaret

Dayhoff in 1965 who developed a first protein sequence database called Atlas of Protein

Sequence and Structure Subsequently in the early 1970s the Brookhaven national

laboratory established the Protein Data Bank for archiving three-dimensional protein

structures At its onset the database stored less than a dozen protein structures compared

to more than 30000 structures today The first sequence alignment algorithm was

2

Developed by Needleman and Wunsch in 1970 This was a fundamental step in the

development of the field of bioinformatics which paved the way for the routine sequence

comparisons and database searching practiced by modern biologists

10 The recent advance of Bioinformatics is molecular modeling which is aimed at

understanding structure-function and structure property relationship in physico-chemical

processes and pharmaceuticals amp thus has become increasingly important for finding and

designing new drugs In fact computers are playing an important role in new drug

discovery and drug design

HEPATITIS-

Hepatitis (plural hepatitides) implies injury

to liver characterized by presence of inflammatory cells in the

liver tissue Etymologically from ancient Greek hepar or hepato- meaning liver and

suffix -itis denoting inflammationrsquo The condition can be self limiting healing on its

own or can progress to scarring of the liver

Hepatitis is acute when it lasts less than 6 months

and chronic when it persists longer A group of viruses known as the

hepatitis viruses cause most cases of liver damage worldwide

Hepatitis can also be due to toxins (notably alcohol) other infections or

from autoimmune process

It may run a sub clinical course when

the affected person may not feel ill The patient becomes unwell and

symptomatic when the disease impairs liver functions that include

3

among other things screening of harmful substances regulation of

blood composition and production of bile to help digestion

Causes

Acute hepatitis

Viral Hepatitis Hepatitis A to E (more than 95 of viral

cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever

virus Adenoviruses

Non viral infection Toxoplasma Leptospira Q fever Rocky

mountain spotted fever

Alcohol

Toxins Amanita toxin in mushrooms Carbon

tetrachloride Asafetida

Drugs Paracetamol Amoxicillin Antituberculosis

medicines Minocycline and many others

Ischemic hepatitis (circulatory insufficiency)(1)

Pregnancy

Auto immune conditions eg Systemic Lupus

Erythematosus (SLE)

Metabolic diseases eg Wilsons disease

Chronic hepatitis

Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C

(Hepatitis A and E do not lead to chronic disease)

4

Autoimmune Autoimmune hepatitis

Alcohol

Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole

Non-alcoholic steatohepatitis

Heredity Wilsons disease alpha 1-antitrypsin deficiency

Primary biliary cirrhosis and primary sclerosing

cholangitis occasionally mimic chronic hepatitis[4]

Viral hepatitis

A virus is a particle which is smaller than bacteria and contains complex genetic

information called DNA or RNA This genetic material allows the virus to infect bacteria

or living cells set up the machinery to reproduce itself leading to destruction of the cell

in which it resides To date five viruses labeled A through E have been identified which

appear to cause viral hepatitis Viruses A and E can be contracted from contaminated

water or food (by mouth) while viruses B C and D are transmitted by direct injection

into the bloodstream (through any method of injection under the skin) The term viral

hepatitis describes any one of the illnesses caused by the five viruses mentioned and

consists of an infection of liver cells which leads to damage of the liver over days in

some cases but over many years in others Thirty years ago none of the hepatitis viruses

had been identified In the 1960s transfusion-related viral hepatitis was extremely

common with 30 of patients receiving blood products becoming infected By 1970 a

blood test called the Australia antigen was developed which appeared to identify those

infected with one hepatitis virus which we now call hepatitis B The

investigator who discovered the Australia antigen the protein which makes up the coat of

the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded

the Nobel prize Our understanding of viral hepatitis has grown tremendously since the

discovery of the Australia antigen

5

Currently 11 viruses are recognized as causing hepatitis Two are

herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are

hepatotropic viruses

EBV and CMV cause mild self-resolving forms of hepatitis with no permanent

hepatic damage Both viruses causes the typical infectious mononucleosis of

fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well characterized

hepatitis G and TTV(transfusion transmitted virus) are newly discovered

viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called

enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The

most

important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally

called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common especially in

children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the disease

Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the

alimentary tract and spreads to infect the liver where it multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset

of symptoms

6

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 2: Project Report on Hepatitis Virus

Introduction

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins Bioinformatics is limited to sequence

structural and functional analysis of genes and genomes and their corresponding

products and is often considered computational molecular biology It consists of

two subfields the development of computational tools and databases and the application

of these tools and databases in generating biological knowledge to better understand

living systems These tools are used in three areas of genomic and molecular biological

research molecular sequence analysis molecular structural analysis and molecular

functional analysis The areas of sequence analysis include sequence alignment sequence

database searching motif and pattern discovery gene and promoter finding

reconstruction of evolutionary relationships and genome assembly and comparison

Structural analyses include protein and nucleic acid structure analysis comparison

Classification and prediction The functional analysis includes gene expression profiling

protein- protein interaction prediction protein sub cellular localization prediction

metabolic pathway reconstruction and simulation The three aspects of bioinformatics

analysis are not isolated but often interact to produce integrated results For example

protein structure prediction depends on sequence alignment data clustering of gene

expression profiles requires the use of phylogenetic tree construction methods derived

In sequence analysis Sequence- based prediction is related functional analysis of co

expressed genes The first major bioinformatics project was undertaken by Margaret

Dayhoff in 1965 who developed a first protein sequence database called Atlas of Protein

Sequence and Structure Subsequently in the early 1970s the Brookhaven national

laboratory established the Protein Data Bank for archiving three-dimensional protein

structures At its onset the database stored less than a dozen protein structures compared

to more than 30000 structures today The first sequence alignment algorithm was

2

Developed by Needleman and Wunsch in 1970 This was a fundamental step in the

development of the field of bioinformatics which paved the way for the routine sequence

comparisons and database searching practiced by modern biologists

10 The recent advance of Bioinformatics is molecular modeling which is aimed at

understanding structure-function and structure property relationship in physico-chemical

processes and pharmaceuticals amp thus has become increasingly important for finding and

designing new drugs In fact computers are playing an important role in new drug

discovery and drug design

HEPATITIS-

Hepatitis (plural hepatitides) implies injury

to liver characterized by presence of inflammatory cells in the

liver tissue Etymologically from ancient Greek hepar or hepato- meaning liver and

suffix -itis denoting inflammationrsquo The condition can be self limiting healing on its

own or can progress to scarring of the liver

Hepatitis is acute when it lasts less than 6 months

and chronic when it persists longer A group of viruses known as the

hepatitis viruses cause most cases of liver damage worldwide

Hepatitis can also be due to toxins (notably alcohol) other infections or

from autoimmune process

It may run a sub clinical course when

the affected person may not feel ill The patient becomes unwell and

symptomatic when the disease impairs liver functions that include

3

among other things screening of harmful substances regulation of

blood composition and production of bile to help digestion

Causes

Acute hepatitis

Viral Hepatitis Hepatitis A to E (more than 95 of viral

cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever

virus Adenoviruses

Non viral infection Toxoplasma Leptospira Q fever Rocky

mountain spotted fever

Alcohol

Toxins Amanita toxin in mushrooms Carbon

tetrachloride Asafetida

Drugs Paracetamol Amoxicillin Antituberculosis

medicines Minocycline and many others

Ischemic hepatitis (circulatory insufficiency)(1)

Pregnancy

Auto immune conditions eg Systemic Lupus

Erythematosus (SLE)

Metabolic diseases eg Wilsons disease

Chronic hepatitis

Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C

(Hepatitis A and E do not lead to chronic disease)

4

Autoimmune Autoimmune hepatitis

Alcohol

Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole

Non-alcoholic steatohepatitis

Heredity Wilsons disease alpha 1-antitrypsin deficiency

Primary biliary cirrhosis and primary sclerosing

cholangitis occasionally mimic chronic hepatitis[4]

Viral hepatitis

A virus is a particle which is smaller than bacteria and contains complex genetic

information called DNA or RNA This genetic material allows the virus to infect bacteria

or living cells set up the machinery to reproduce itself leading to destruction of the cell

in which it resides To date five viruses labeled A through E have been identified which

appear to cause viral hepatitis Viruses A and E can be contracted from contaminated

water or food (by mouth) while viruses B C and D are transmitted by direct injection

into the bloodstream (through any method of injection under the skin) The term viral

hepatitis describes any one of the illnesses caused by the five viruses mentioned and

consists of an infection of liver cells which leads to damage of the liver over days in

some cases but over many years in others Thirty years ago none of the hepatitis viruses

had been identified In the 1960s transfusion-related viral hepatitis was extremely

common with 30 of patients receiving blood products becoming infected By 1970 a

blood test called the Australia antigen was developed which appeared to identify those

infected with one hepatitis virus which we now call hepatitis B The

investigator who discovered the Australia antigen the protein which makes up the coat of

the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded

the Nobel prize Our understanding of viral hepatitis has grown tremendously since the

discovery of the Australia antigen

5

Currently 11 viruses are recognized as causing hepatitis Two are

herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are

hepatotropic viruses

EBV and CMV cause mild self-resolving forms of hepatitis with no permanent

hepatic damage Both viruses causes the typical infectious mononucleosis of

fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well characterized

hepatitis G and TTV(transfusion transmitted virus) are newly discovered

viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called

enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The

most

important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally

called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common especially in

children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the disease

Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the

alimentary tract and spreads to infect the liver where it multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset

of symptoms

6

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 3: Project Report on Hepatitis Virus

Developed by Needleman and Wunsch in 1970 This was a fundamental step in the

development of the field of bioinformatics which paved the way for the routine sequence

comparisons and database searching practiced by modern biologists

10 The recent advance of Bioinformatics is molecular modeling which is aimed at

understanding structure-function and structure property relationship in physico-chemical

processes and pharmaceuticals amp thus has become increasingly important for finding and

designing new drugs In fact computers are playing an important role in new drug

discovery and drug design

HEPATITIS-

Hepatitis (plural hepatitides) implies injury

to liver characterized by presence of inflammatory cells in the

liver tissue Etymologically from ancient Greek hepar or hepato- meaning liver and

suffix -itis denoting inflammationrsquo The condition can be self limiting healing on its

own or can progress to scarring of the liver

Hepatitis is acute when it lasts less than 6 months

and chronic when it persists longer A group of viruses known as the

hepatitis viruses cause most cases of liver damage worldwide

Hepatitis can also be due to toxins (notably alcohol) other infections or

from autoimmune process

It may run a sub clinical course when

the affected person may not feel ill The patient becomes unwell and

symptomatic when the disease impairs liver functions that include

3

among other things screening of harmful substances regulation of

blood composition and production of bile to help digestion

Causes

Acute hepatitis

Viral Hepatitis Hepatitis A to E (more than 95 of viral

cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever

virus Adenoviruses

Non viral infection Toxoplasma Leptospira Q fever Rocky

mountain spotted fever

Alcohol

Toxins Amanita toxin in mushrooms Carbon

tetrachloride Asafetida

Drugs Paracetamol Amoxicillin Antituberculosis

medicines Minocycline and many others

Ischemic hepatitis (circulatory insufficiency)(1)

Pregnancy

Auto immune conditions eg Systemic Lupus

Erythematosus (SLE)

Metabolic diseases eg Wilsons disease

Chronic hepatitis

Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C

(Hepatitis A and E do not lead to chronic disease)

4

Autoimmune Autoimmune hepatitis

Alcohol

Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole

Non-alcoholic steatohepatitis

Heredity Wilsons disease alpha 1-antitrypsin deficiency

Primary biliary cirrhosis and primary sclerosing

cholangitis occasionally mimic chronic hepatitis[4]

Viral hepatitis

A virus is a particle which is smaller than bacteria and contains complex genetic

information called DNA or RNA This genetic material allows the virus to infect bacteria

or living cells set up the machinery to reproduce itself leading to destruction of the cell

in which it resides To date five viruses labeled A through E have been identified which

appear to cause viral hepatitis Viruses A and E can be contracted from contaminated

water or food (by mouth) while viruses B C and D are transmitted by direct injection

into the bloodstream (through any method of injection under the skin) The term viral

hepatitis describes any one of the illnesses caused by the five viruses mentioned and

consists of an infection of liver cells which leads to damage of the liver over days in

some cases but over many years in others Thirty years ago none of the hepatitis viruses

had been identified In the 1960s transfusion-related viral hepatitis was extremely

common with 30 of patients receiving blood products becoming infected By 1970 a

blood test called the Australia antigen was developed which appeared to identify those

infected with one hepatitis virus which we now call hepatitis B The

investigator who discovered the Australia antigen the protein which makes up the coat of

the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded

the Nobel prize Our understanding of viral hepatitis has grown tremendously since the

discovery of the Australia antigen

5

Currently 11 viruses are recognized as causing hepatitis Two are

herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are

hepatotropic viruses

EBV and CMV cause mild self-resolving forms of hepatitis with no permanent

hepatic damage Both viruses causes the typical infectious mononucleosis of

fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well characterized

hepatitis G and TTV(transfusion transmitted virus) are newly discovered

viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called

enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The

most

important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally

called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common especially in

children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the disease

Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the

alimentary tract and spreads to infect the liver where it multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset

of symptoms

6

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 4: Project Report on Hepatitis Virus

among other things screening of harmful substances regulation of

blood composition and production of bile to help digestion

Causes

Acute hepatitis

Viral Hepatitis Hepatitis A to E (more than 95 of viral

cause) Herpes simplex Cytomegalovirus Epstein-Barr Yellow fever

virus Adenoviruses

Non viral infection Toxoplasma Leptospira Q fever Rocky

mountain spotted fever

Alcohol

Toxins Amanita toxin in mushrooms Carbon

tetrachloride Asafetida

Drugs Paracetamol Amoxicillin Antituberculosis

medicines Minocycline and many others

Ischemic hepatitis (circulatory insufficiency)(1)

Pregnancy

Auto immune conditions eg Systemic Lupus

Erythematosus (SLE)

Metabolic diseases eg Wilsons disease

Chronic hepatitis

Viral hepatitis Hepatitis B with or without hepatitis D hepatitis C

(Hepatitis A and E do not lead to chronic disease)

4

Autoimmune Autoimmune hepatitis

Alcohol

Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole

Non-alcoholic steatohepatitis

Heredity Wilsons disease alpha 1-antitrypsin deficiency

Primary biliary cirrhosis and primary sclerosing

cholangitis occasionally mimic chronic hepatitis[4]

Viral hepatitis

A virus is a particle which is smaller than bacteria and contains complex genetic

information called DNA or RNA This genetic material allows the virus to infect bacteria

or living cells set up the machinery to reproduce itself leading to destruction of the cell

in which it resides To date five viruses labeled A through E have been identified which

appear to cause viral hepatitis Viruses A and E can be contracted from contaminated

water or food (by mouth) while viruses B C and D are transmitted by direct injection

into the bloodstream (through any method of injection under the skin) The term viral

hepatitis describes any one of the illnesses caused by the five viruses mentioned and

consists of an infection of liver cells which leads to damage of the liver over days in

some cases but over many years in others Thirty years ago none of the hepatitis viruses

had been identified In the 1960s transfusion-related viral hepatitis was extremely

common with 30 of patients receiving blood products becoming infected By 1970 a

blood test called the Australia antigen was developed which appeared to identify those

infected with one hepatitis virus which we now call hepatitis B The

investigator who discovered the Australia antigen the protein which makes up the coat of

the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded

the Nobel prize Our understanding of viral hepatitis has grown tremendously since the

discovery of the Australia antigen

5

Currently 11 viruses are recognized as causing hepatitis Two are

herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are

hepatotropic viruses

EBV and CMV cause mild self-resolving forms of hepatitis with no permanent

hepatic damage Both viruses causes the typical infectious mononucleosis of

fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well characterized

hepatitis G and TTV(transfusion transmitted virus) are newly discovered

viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called

enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The

most

important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally

called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common especially in

children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the disease

Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the

alimentary tract and spreads to infect the liver where it multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset

of symptoms

6

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 5: Project Report on Hepatitis Virus

Autoimmune Autoimmune hepatitis

Alcohol

Drugs Methyl-dopa NitrofurantoinIisoniazide Ketoconazole

Non-alcoholic steatohepatitis

Heredity Wilsons disease alpha 1-antitrypsin deficiency

Primary biliary cirrhosis and primary sclerosing

cholangitis occasionally mimic chronic hepatitis[4]

Viral hepatitis

A virus is a particle which is smaller than bacteria and contains complex genetic

information called DNA or RNA This genetic material allows the virus to infect bacteria

or living cells set up the machinery to reproduce itself leading to destruction of the cell

in which it resides To date five viruses labeled A through E have been identified which

appear to cause viral hepatitis Viruses A and E can be contracted from contaminated

water or food (by mouth) while viruses B C and D are transmitted by direct injection

into the bloodstream (through any method of injection under the skin) The term viral

hepatitis describes any one of the illnesses caused by the five viruses mentioned and

consists of an infection of liver cells which leads to damage of the liver over days in

some cases but over many years in others Thirty years ago none of the hepatitis viruses

had been identified In the 1960s transfusion-related viral hepatitis was extremely

common with 30 of patients receiving blood products becoming infected By 1970 a

blood test called the Australia antigen was developed which appeared to identify those

infected with one hepatitis virus which we now call hepatitis B The

investigator who discovered the Australia antigen the protein which makes up the coat of

the virus and which is now called the hepatitis B surface antigen (HBsAg) was awarded

the Nobel prize Our understanding of viral hepatitis has grown tremendously since the

discovery of the Australia antigen

5

Currently 11 viruses are recognized as causing hepatitis Two are

herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are

hepatotropic viruses

EBV and CMV cause mild self-resolving forms of hepatitis with no permanent

hepatic damage Both viruses causes the typical infectious mononucleosis of

fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well characterized

hepatitis G and TTV(transfusion transmitted virus) are newly discovered

viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called

enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The

most

important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally

called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common especially in

children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the disease

Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the

alimentary tract and spreads to infect the liver where it multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset

of symptoms

6

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 6: Project Report on Hepatitis Virus

Currently 11 viruses are recognized as causing hepatitis Two are

herpes viruses (cytomegalovirus virus[CMV] and Epstein- Barr virus[EBV]) and 9 are

hepatotropic viruses

EBV and CMV cause mild self-resolving forms of hepatitis with no permanent

hepatic damage Both viruses causes the typical infectious mononucleosis of

fatigue nausea and malaise

Of the nine human hepatotrofic viruses only five are well characterized

hepatitis G and TTV(transfusion transmitted virus) are newly discovered

viruses hepatitis A (sometimes called infectious hepatitis) and hepatic E (formally called

enteric ndashtransmitted NANB hepatitis) are transmitted by fecal-oral contamination The

most

important type include hepatitis B(sometime called serum hepatitis) hepatitis C (formally

called formally non-A non-B hepatic) and hepatitis D (formally called delta hepatitis)

Hepatitis A

Incubation period 3-5 weeks (mean 28 days)

Milder disease than Hepatitis B asymptomatic infections are very common especially in

children

Adults especially pregnant women may develop more severe disease

Although convalescence may be prolonged there is no chronic form of the disease

Fulminant hepatitis is rare 01 of cases Virus enters via the gut replicates in the

alimentary tract and spreads to infect the liver where it multiplies in hepatocytes

Viraemia is transient Virus is excreted in the stools for two weeks preceding the onset

of symptoms

6

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 7: Project Report on Hepatitis Virus

World-wide distribution endemic in most countries The incidence in first world

countries is declining There is an especially high incidence in developing countries and

rural areas In rural areas of South Africa the seroprevalence is 100

Hepatitis E

Incubation period 30-40 days

Acute self limiting hepatitis no chronic carrier state

Age predominantly young adults 15-40 years Fulminate hepatitis in pregnant women

Mortality rate is high (up to 40)Similar to hepatitis A virus replicates in the gut

initially before invading the liver and virus is shed in the stool prior to the onset of

symptoms Viraemia is transient A large inoculum of virus is needed to establish

infectionLittle is known yet The incidence of infection appears to be low in first world

countries

Hepatitis C

Putative Togavirus related to the Flavi and Pesti viruses

Thus probably enveloped Has a ssRNA genome

Does not grow in cell culture but can infect Chimpanzees Incubation period 6-8 weeks

Causes a milder form of acute hepatitis than does hepatitis B

But 50 individuals develop chronic infection following exposure

1) Chronic liver disease

2) Hepatocellular carcinoma

Incidence endemic world-wide high incidence in Japan Italy and Spain

In South Africa 1 blood donors have antibodies

Hepatitis D

7

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 8: Project Report on Hepatitis Virus

Defective virus which requires Hepatitis B as a helper virus in order to replicate

Infection therefore only occurs in patients who are already infected with Hepatitis

BIncreased severity of liver disease in Hepatitis B carriers virus particle 36 nm in

diameter encapsulated with HBsAg derived from HBV

delta antigen is associated with virus particles ssRNA genome

Identified in intra-venous drug abusers

Hepatitis G

A virus originally cloned from the serum of a surgeon with non-A non-B non-C

hepatitis has been called Hepatitis G virus It was implicated as a cause of parenterally

transmitted hepatitis but is no longer believed to be a major agent of liver disease It has

been classified as a Flavivirus

Hepatitis B

What is the Hepatitis B Virus

The hepatitis B virus (HBV) is a DNA-containing virus which is capable of infecting

human liver cells and other cells in the body once it gains access to the blood stream

One of the most interesting features of the hepatitis B virus is that the virus itself does not

damage the liver the damage being caused by the individuals own immune system

attacking the virus-infected cells Since liver damage from the virus may be very little

many patients are called healthy carriers This means that although they may transmit the

disease to others they have normal-appearing livers and normal liver function tests

While many individuals remain healthy for many years or a lifetime others develop

chronic hepatitis cirrhosis and occasionally liver cell cancer These outcomes are linked

to the virus and its effects although it is unlikely that the virus directly causes cancer

Those patients who develop hepatitis (damage to liver cells with inflammation) do so on

account of the bodys normal inclination to attack the foreign proteins contained in

viruses and in the cells in which the viruses are found This process called the immune

response determines the pace and the severity of the liver cell injury in this condition

and will be described in more detail below

8

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 9: Project Report on Hepatitis Virus

Since the identification of the hepatitis B virus several other viruses which are nearly

identical have been identified in Eastern woodchucks ground squirrels and Peking

ducks The members of this virus family termed the Hepadna viruses have similar life

cycles to that observed in man and can serve as animal models allowing further study of

these unique disease-causing agents

Classification and general features

Family hepadnaviridae

Genera orthohepadnavirus(eghepatitis B [HBV] of human ) Avihepadnavirus (eg

Duck hepatitis B virus)

Size 42nm Virions (also known as Dane particles) contain a circular dsDNA genome

9

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 10: Project Report on Hepatitis Virus

Fighepatitis B virus structure

HBV Antigens

HBsAg = surface (coat) protein produced in excess as small spheres and tubules

HBcAg = inner core protein

HBeAg = secreted protein function unknown

Clinical Features

Incubation period 2 - 5 months

Insidious onset of symptoms Tends to cause a more severe disease than Hepatitis A

Asymptomatic infections occur frequently

Pathogenesis

Infection is parenterally transmitted The virus replicates in the liver and virus

particles as well as excess viral surface protein are shed in large amounts into the blood

Viraemia is prolonged and the blood of infected individuals is highly infectious

Complications

1) Persistant infection-

Following acute infection approximately 5 of infected individuals fail to eliminate the

virus completely and become persistantly infected

Those who are at particular risk include

babies young children

immunocompromised patients

males gt females

10

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 11: Project Report on Hepatitis Virus

The virus persists in the hepatocytes and on-going liver damage occurs because of the

host immune response against the infected liver cells

Chronic infection may take one of two forms

Chronic persistent Hepatitis - the virus persists but there is minimal liver damage

Chronic Active Hepatitis - There is aggressive destruction of liver tissue and rapid

progression to cirrhosis or liver failure Patients who become persistently infected are at

risk of developing hepatocellular carcinoma (HCC)

HBV is thought to play a role in the development of this malignancy because

a) 80 of patients with HCC are carriers of hepatitis B

b) Virus DNA can be identified in hepatocellular carcinoma cells

c) Virus DNA can integrate into the host chromosome

3) Fulminant Hepatitis

Rare accounts for 1 of infections

Epidemiology

Prevalence of disease in Africa

World-wide there are 450 million persistant carriers of hepatitis B 50 million of which

are in Africa Carriage rates vary markedly in different areas In South Africa infection is

much more common in rural communities than in the cities Hepatitis B is parenterally

transmitted

1) Blood

Blood transfusions serum products

sharing of needles razors

Tattooing acupuncture

Renal dialysis

Organ donation

2) Sexual intercourse

11

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 12: Project Report on Hepatitis Virus

3) Horizontal transmission in children families close personal contact

This is the major mode of transmission in South Africa where the majority of individuals

become infected at between three and nine years of age

Horizontal transmission also occurs in childrens institutions and mental homes

4) Vertical transmission - perinatal transmission from a carrier mother to her baby

Tran placental (rare)

during delivery

Post natal breast feeding close contact

(This is the major mode of transmission in South East Asia)

Diagnosis Serology

Acute infection with resolution

Viral antigens

1) Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody response

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers (see below)

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

12

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 13: Project Report on Hepatitis Virus

well as those who clear the infection Its presence indicates exposure to HBVof the

chronic carrier

FigHepatitis B virus in serum

Prevention

1) Active Immunization

Two types of vaccine are available

Serum derived - prepared from HBsAg purified from the serum of HBV carriers

Recombinant HBsAg - made by genetic engineering in yeasts

Both vaccines are equally safe and effective The administration of three doses induces

protective levels of antibodies in 95 of vaccine recipients

Universal immunization of infants was introduced in April 1995 Infants receive 3 doses

at 6 10 and 14 weeks of age

Vaccine should be administered to people at high risk of infection with HBV

1) Health care workers

2) Sexual partners of chronic carriers

3) Infants of HBV carrier mothers

13

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 14: Project Report on Hepatitis Virus

2) Passive Antibody

Hepatitis B immune globulin should be administered to non immune individuals

following single episode exposure to HBV-infected blood For example needlestick

injuries

What is Hepatitis B Infection Like

When most individuals become infected with the hepatitis B virus they are not aware of

the infection for several weeks until they develop symptoms of acute hepatitis such as

nausea fatigue and jaundice (yellowing of the eyes) The acute hepatitis phase may last

for several weeks and occasionally leads to hospitalization but acute hepatitis B resolves

completely in 95 of those infected

Others who do not develop significant symptoms following exposure

may not be aware of the infection These individuals may also overcome the infection

completely and develop immunity but frequently become chronic carriers

The outcome of hepatitis B infection depends to a great extent on

the status of the persons immune system at the time of exposure Most chronic carriers or

those with chronic hepatitis B are not aware of their on-going infection although some

have persistent fatigue

Molecular virology

Genome circular and 32kb in size double strandedIt has compact

14

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 15: Project Report on Hepatitis Virus

Fig hepatitis B virus genome

organization with four overlapping reading frames running in one direction and no

noncoding regions The minus strand is unit length and has a protein covalently attached

15

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 16: Project Report on Hepatitis Virus

to the 5 end The other strand the plus strand is variable in length but has less than unit

length and has an RNA oligonulceotide at its 5 end Thus neither DNA strand is closed

and circularity is maintained by cohesive ends (Strauss 2002) The four overlapping open

reading frames (ORFs) in the genome are responsible for the transcription and expression

of seven different hepatitis B proteins The transcription and translation of these proteins

is through the used of multiple in-frame start codons The HBV genome also contains

parts that regulate transcription determine the site of polyadenylation and a specific

transcript for encapsidation into the nucleocapsid

Life cycle

In order to reproduce the hepatitis B virus must first attach onto a cell which is capable of

supporting its replication Although hepatocytes are known to be the most effective cell type for

replicating HBV other types of cells in the human body have be found to be able to support

replication to a lesser degree

The initial steps following HBV entry are not clearly defined although it is

known that the virion initially attaches to a susceptible hepatocyte through recognition of cell

surface receptor that has yet to be indified (Garces HBVP) The DNA is then enters into the

nucleus where it is known to form a convalently close circular form called cccDNA

The (-) strand of cccDNA is the template for transcription

by RNA polII of a longer than genome length RNA called the pregenome and shorter subgenomic

transcripts all of which serve as mRNAs The shorter viral mRNAs are translated by ribosomes

attached to the cells endoplasmic reticulum and the proteins that are destined to become HBV

surface antigens in the viral envelope are assembled

The pregenome RNA is translated to produce a polymerase protein P which then

binds to a specific site at the 3 end of its own transcript where viral DNA synthesis eventually

occurs Occuring at the same time as capsid formation the RNA-P protein complex is packaged

and reverse transcription begins

16

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 17: Project Report on Hepatitis Virus

At early times after the infection the DNA is recirculated to the nucleus

where the process is repeated resulting in the the accumulation of 10 to 30 molecules of CCC

DNA and an increase in viral mRNA concentrations (Flint et al 765 )

17

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 18: Project Report on Hepatitis Virus

Fig HBV life cycle

The hepatitis B virion also known as the Dane particle is the one infectious particle

found within the body of an infected patient This virion has a diameter of 42nm and its

outer envelope contains a high quantity of hepatitis b surface proteins The envelope

surrounds the inner nucleocapsid which is made up of 180 hepatitis B core proteins

arranged in an icosahedral arrangement The nucleocapsid also contains at least one

hepatitis b ploymerase protein (P) along with the HBV genome

In infected people virions actually compose a small minority of HBV-derived particles

18

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 19: Project Report on Hepatitis Virus

Large numbers of smaller subviral particles are also presentthat usually outnumber the

virions in the ratio of 1001These two subviral particles the hepatitis B filament and a

hepatitis B sphereare often referred to as a group named surface antigen particlesThe

sphere contains both middle and small surface proteins whereas the filament also

includes large hepatitis B surface protein lso includes large hepatitis B surface protein

The absence of the hepatitis B core polymerase and genome causes these particles to

have a non-infectious nature High levels of these non-infectious particles can be found

during the acute phase of the infection Since the non-infectious particles present the

same sites as the virion they induce a significant immune response and are thought to be

non-advantagous for the virus However it is also believed that the presence of high

levels of non-infectious particles may allow the infectious viral particles to travel

undetected by antibodies through the blood stream (Garces HBVP

Hepatitis B Antigens

There are three different types of hepatitis b antigens encoded by the HBV genome-

Hepatitis B Surface antigen (HBsAg)- There are three different types of hepatitis B

surface antigens small hepatitis B surface antigen (HBsAg or SHBsAg) middle hepatitis

B surface antigen (MHBsAg) and large hepatitis B surface Antigen (LHBsAg) HBsAg

is the smallest protein of the hepatitis B surface proteins and has historically been known

as the Australia antigen (Au antigen) It is very hydrophobic containing four-

transmembrane spanning regions This protein is the prime constituent of all hepatitis b

particle forms and appears to be manufactured by the virus in high quantities It also

contains a highly antigenic epitope which may be responsible for triggering immune

response Regardless of the high Antigenicity and prevalence of these particlesthe

immune system appears basically oblivious to their presence

19

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 20: Project Report on Hepatitis Virus

Hepatitis B Core Antigen (HBcAg)- The only HBV antigen that can not be detected

directly by blood test this antigen can only be isolated by analyzing an infected

hepatocyte A 185 amino acid protein is expressed in the cytoplasm of infected cells they

are highly associated with nucleocapsid assembly (Strauss 2002)

Hepatitis B e Antigen (HBeAg)- The e antigen is named due to its early appearance

during an acute HBV infection Thought to be located in the core structure of the virus

molecule this antigen can be detected by blood test If found its usually indicative of

complete virus particles in circulation (Strauss 2002)

20

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 21: Project Report on Hepatitis Virus

21

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 22: Project Report on Hepatitis Virus

REVIEW OF LITERATURE

Approximately 5 of the world population is infected by the hepatitis B virus (HBV) that

causes a necroinflammatory liver disease of variable duration and severity Chronically

infected patients with active liver disease carry a high risk of developing cirrhosis and

hepatocellular carcinoma

Hepatitis B is caused by hepatitis B (HBV )double ndashstranded circular DNA virus of

Complex structure HBV is classified as orthohepadnavirus within the family

Headnaviridae Serum of individuals infected with hepatitis B contains 3 distinct antigen

particle a spherical 22 nm particle a 42 nm (containing DNA and DNA polymerase)

called Dane particle and tubular or filamentous that vary in length These are infective

form of virusThe hepatitis B is normally transmitted by blood transfusion contaminated

equipment drug usersrsquo unsterile needle or any body secretion[1]

The immune response to HBV-encoded antigens is responsible both for viral clearance

and for disease pathogenesis during this infection While the humoral antibody response

to viral envelope antigens contributes to the clearance of circulating virus particles the

cellular immune response to the envelope nucleocapsid and polymerase antigens

eliminates infected cells

The dominant cause of viral persistence during HBV infection is the development of a

weak antiviral immune response to the viral antigens While neonatal tolerance probably

plays an important role in viral persistence in patients infected at birth the basis for poor

responsiveness in adult-onset infection is not well understood and requires further

analysis Viral evasion by epitope inactivation and T cell receptor antagonism may

contribute to the worsening of viral persistence in the setting of an ineffective immune

response as can the incomplete downregulation of viral gene expression and the infection

of immunologically privileged tissues Chronic liver cell injury and the attendant

inflammatory and regenerative responses create the mutagenic and mitogenic stimuli for

the development of DNA damage that can cause hepatocellular carcinoma Elucidation of

the immunological and virological basis for

22

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 23: Project Report on Hepatitis Virus

HBV persistence may yield immunotherapeutic and antiviral strategies to terminate

chronic HBV infection and reduce the risk of its life-threatening sequellae[2]

Hepadnaviruses (hepatitis B viruses) cause transient and chronic infections of the liver

Transient infections run a course of several months and chronic infections are often

lifelong Chronic infections can lead to liver failure with cirrhosis and hepatocellular

carcinoma The replication strategy of these viruses has been described in great detail but

virus-host interactions leading to acute and chronic disease are still poorly understood

Studies on how the virus evades the immune response to cause prolonged transient

infections with high-titer viremia and lifelong infections with an ongoing inflammation of

the liver are still at an early stage and the role of the virus in liver cancer is still elusive

The state of knowledge in this very active field is therefore reviewed with an emphasis on

past accomplishments as well as goals for the future [3]

(1)Surface antigen (HBsAg) is secreted in excess into the blood as 22 nm spheres and

tubules Its presence in serum indicates that virus replication is occurring in the liver

2) e antigen (HBeAg) secreted protein is shed in small amounts into the blood Its

presence in serum indicates that a high level of viral replication is occurring in the liver

3) core antigen (HBcAg) core protein is not found in blood

Antibody

1) Surface antibody (anti-HBs) becomes detectable late in convalescence and indicates

immunity following infection It remains detectable for life and is not found in chronic

carriers

2) e antibody (anti-HBe) becomes detectable as viral replication falls It indicates low

infectivity in a carrier

3) Core IgM rises early in infection and indicates recent infection

4) Core IgG rises soon after IgM and remains present for life in both chronic carriers as

well as those who clear the infection Its presence indicates exposure to HBV of the

chronic carrier[4]

23

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 24: Project Report on Hepatitis Virus

Homology or comparative modeling involves the prediction of the structure of a query

sequence from the structures of one or more structural templates The procedure involves

the identification of possible templates that have a clear sequence relationship to the

query the assembly of the model the prediction of regions of the structure that are likely

to have different conformations than the templates (eg loops) and ultimately the

refinement of the structure in an attempt to account for inherent differences between the

template and query structures As mentioned above homology modeling figures heavily

as a rationale for structural genomics initiatives under the stated assumption that accurate

models can be built for query sequences that have a greater than 30 sequence identity

with their best template

The quality of the alignment of the query to the template sequence is a major factor in

determining the quality of homology models This is one of the sources of the 30 rule

because alignment quality usually decreases dramatically below about 30 sequence

identity (A structural explanation for this observation has been offered by Chung and

Subbiah 1996) Advances in the accuracy of sequence alignments using structure-based

profile methods such as those described above should result in continuing improvements

in the quality of homology models [56]

With the number of protein-ligand complexes available in the Protein Data Bank

constantly growing structure-based approaches to drug design and screening have

become increasingly important Alongside this explosion of structural information a

number of molecular docking methods have been developed over the last years with the

aim of maximally exploiting all available structural and chemical information that can be

derived from proteins from ligands and from protein-ligand complexes In this respect

the term guided docking is introduced to refer to docking approaches that incorporate

some degree of chemical information to actively guide the orientation of the ligand into

the binding site To reflect the focus on the use of chemical information a classification

scheme for guided docking approaches is proposed In general terms guided docking

approaches can be divided into indirect and direct approaches Indirect approaches

24

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 25: Project Report on Hepatitis Virus

incorporate chemical information implicitly having an effect on scoring but not on

orienting the ligand during sampling In contrast direct approaches incorporate chemical

information explicitly thus actively guiding the

orientation of the ligand during sampling Direct approaches can be further divided into

protein-based mapping-based and ligand-based approaches to reflect the source used to

derive the features capturing the chemical information inside the protein cavity Within

each category a representative list of docking approaches is discussed In view of the

limitations of current scoring functions it was generally found that making optimal use of

chemical information represents an efficient knowledge-based strategy for improving

binding affinity estimations ligand binding-mode predictions and virtual screening

enrichments obtained from protein-ligand docking [7]

This review gives an introduction into ligand - receptor docking and illustrates the basic

underlying concepts An overview of different approaches and algorithms is provided

Although the application of docking and scoring has led to some remarkable successes

there are still some major challenges ahead which are outlined here as well Approaches

to address some of these challenges and the latest developments in the area are presented

Some aspects of the assessment of docking program performance are discussed A

number of successful applications of structure-based virtual screening are described [8]

25

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 26: Project Report on Hepatitis Virus

26

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 27: Project Report on Hepatitis Virus

Material and methods

Bioinformatics is an interdisciplinary research area at the interface between computer

science and biological science It involves the technology that uses computers for storage

retrieval manipulation and distribution of information related to biological

macromolecules such as DNA RNA and proteins

Bioinformatics is limited to sequence structural and functional analysis of genes and

genomes and their corresponding products and is often considered computational

molecular biology It consists of two subfields the development of computational tools

and databases and the application of these tools and databases in generating biological

knowledge to better understand living systems These tools are used in three areas of

genomic and molecular biological research molecular sequence analysis

molecular structural analysis and molecular functional analysis

1 NCBI-

Established in 1988 as a national resource for molecular biology information NCBI

creates public databases conducts research in computational biology develops

software tools for analyzing genome data and disseminates biomedical information -

all for the better understanding of molecular processes affecting human health and

disease

Swiss-prot-

a curated protein sequence database which strives to provide a

high level of annotation (such as the description of the function of a

protein its domains structure post-translational modifications

variants etc) a minimal level of redundancy and high level of

integration with other databases

27

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 28: Project Report on Hepatitis Virus

2 Protein sequence- of Glycerate kinase ( HBeAg-binding protein 4)Primary Accession number-Q8IVS8 EC 27131from human sabcelular location ndashcytoplasm Catalytic activity -ATP + (R)-glycerate = ADP + 3-phospho-(R)-glycerate etc 3 FASTA

FASTA is a DNA and Protein sequence alignment software package first described (as

FASTP) by David J Lipman and William R Pearson in 1985 in the article Rapid and

sensitive protein similarity searches The original FASTP program was designed for

protein sequence similarity searching FASTA described in 1988 (Improved Tools for

Biological Sequence Comparison) added the ability to do DNADNA searches translated

proteinDNA searches and also provided a more sophisticated shuffling program for

evaluating statistical significance There are several programs in this package that allow

the alignment of protein sequences and DNA sequences FASTA is pronounced FAST-

Aye and stands for FAST-All because it works with any alphabet an extension of

FAST-P (protein) and FAST-N (nucleotide) alignment

The current FASTA package contains programs for proteinprotein DNADNA

proteintranslated DNA (with frameshifts) and ordered or unordered peptide searches

Recent versions of the FASTA package include special translated search algorithms that

correctly handle frameshift errors (which six-frame-translated searches do not handle

very well) when comparing nucleotide to protein sequence data

In addition to rapid heuristic search methods the FASTA package provides SSEARCH

an implementation of the optimal Smith-Waterman algorithm A major focus of the

package is the calculation of accurate similarity statistics so that biologists can judge

whether an alignment is likely to have occurred by chance or whether it can be used to

infer homology The FASTA package is available fromfastabiochvirginiaedu

4BLAST

In bioinformatics Basic Local Alignment Search Tool or BLAST is an algorithm for

comparing primary biological sequence information such as the amino-acid sequences of

28

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 29: Project Report on Hepatitis Virus

different proteins or the nucleotides of DNA sequences A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences and

identify library sequences that resemble the query sequence above a certain threshold

5 Primary amp secondary structure analysis

Using ProtParam-for primary structure ProtParam computes various physico-chemical properties that can be deduced from a

protein sequence No additional information is required about the protein under

consideration The protein can either be specified as a Swiss-ProtTrEMBL accession

number or ID or in form of a raw sequence White space and numbers are ignored If you

provide the accession number of a Swiss-ProtTrEMBL entry you will be prompted with

an intermediary page that allows you to select the portion of the sequence on which you

would like to perform the analysis The choice includes a selection of mature chains or

peptides and domains from the Swiss-Prot feature table (which can be chosen by clicking

on the positions) as well as the possibility to enter start and end position in two boxes

By default (ie if you leave the two boxes empty) the complete sequence will be

analyzed

It calculate following parameter --

extinction coefficient

half-life

instability index

aliphatic index

29

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 30: Project Report on Hepatitis Virus

Using SOPMA for secondry structure analysis

Recently a new method called the self-optimized prediction method (SOPM) has been

described to improve the success rate in the prediction of the secondary structure of

proteins In this paper we report improvements brought about by predicting all the

sequences of a set of aligned proteins belonging to the same family This improved SOPM

method (SOPMA) correctly predicts 695 of amino acids for a three-state description of

the secondary structure ( -helix szlig-sheet and coil) in a whole database containing 126

chains of non-homologous (less than 25 identity) proteins Joint prediction with

SOPMA and a neural networks method (PHD) correctly predicts 822 of residues for

74 of co-predicted amino acids Predictions are available by Email to deleageibcpfr

or on a Web page (httpwwwibcpfrpredicthtml )

PROTOCOL FOLLOWED

Obtained the Receptor (Target Protein) from the literature references and available journals available online and Pubmed literature for HBV strain

30

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 31: Project Report on Hepatitis Virus

Retrieved the FASTA sequence of the protein HBeAg from the database Swiss- Prot

Retrieved the PDB-ID for template structure using BLAST PDBID 2B8N and found the similarity search

Loaded the target sequence in pdb format in SWISS MODEL as a raw sequence and modeled the receptor

Validated modeled receptor using Structure Analysis Validation Server (SAVS)

Verified our model through different parameter like Ranachandran plot and other which is available in SAVS

Selected the best Ligand from the Database KEGG for HBV disease

Run the HEX and found the structure of drug molecule

31

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 32: Project Report on Hepatitis Virus

32

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 33: Project Report on Hepatitis Virus

Homology modeling

In protein structure prediction homology modeling also known as comparative

modeling is a class of methods for constructing an atomic-resolution model of a protein

from its amino acid sequence (the query sequence or target) Almost all homology

modeling techniques rely on the identification of one or more known protein structures

(known as templates or parent structures) likely to resemble the structure of the query

sequence and on the production of an alignment that maps residues in the query

sequence to residues in the template sequence The sequence alignment and template

structure are then used to produce a structural model of the target Because protein

structures are more conserved than protein sequences detectable levels of sequence

similarity usually imply significant structural similarity

The quality of the homology model is dependent on the quality of the sequence alignment

and template structure The approach can be complicated by the presence of alignment

gaps (commonly called indels) that indicate a structural region present in the target but

not in the template and by structure gaps in the template that arise from poor resolution

in the experimental procedure (usually X-ray crystallography) used to solve the structure

Model quality declines with decreasing sequence identity a typical model has ~2 Aring

agreement between the matched Cα atoms at 70 sequence identity but only 4-5 Aring

agreement at 25 sequence identity Regions of the model that were constructed without

a template usually by loop modeling are generally much less accurate than the rest of

the model particularly if the loop is long Errors in side chain packing and position also

increase with decreasing identity and variations in these packing configurations have

been suggested as a major reason for poor model quality at low identity [2] Taken

together these various atomic-position errors are significant and impede the use of

homology models for purposes that require atomic-resolution data such as drug design

and protein-protein interaction predictions even the quaternary structure of a protein may

be difficult to predict from homology models of its subunit(s) Nevertheless homology

models can be useful in reaching qualitative conclusions about the biochemistry of the

query sequence especially in formulating hypotheses about why certain residues are

33

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 34: Project Report on Hepatitis Virus

(29)

conserved which may in turn lead to experiments to test those hypotheses For example

the spatial arrangement of conserved residues may suggest whether a particular residue is

conserved to stabilize the folding to participate in binding some small molecule or to

foster association with another protein or nucleic acid

Figure First the known template 3D structures are aligned with the target sequence to be

modelled Second spatial features such as CZ - CZ distances hydrogen bonds and main chain and

side chain dihedral angles are transferred from the templates to the target Thus a number of

spatial restraints on its structure are obtained Third the 3D model is obtained by satisfying all the

restraints as well as possible

34

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 35: Project Report on Hepatitis Virus

Homology modeling can produce high-quality structural models when the target and

template are closely related which has inspired the formation of a structural genomics

consortium dedicated to the production of representative experimental structures for all

classes of protein folds The chief inaccuracies in homology modeling which worsen

with lower sequence identity derive from errors in the initial sequence alignment and

from improper template selection Like other methods of structure prediction current

practice in homology modeling is assessed in a biannual large-scale experiment known as

the Critical Assessment of Techniques for Protein Structure Prediction or CASP

Template selection and sequence alignment

The critical first step in homology modeling is the identification of the best template

structure if indeed any are available The simplest method of template identification

relies on serial pairwise sequence alignments aided by database search techniques such as

FASTA and BLAST More sensitive methods based on multiple sequence alignment - of

which PSI-BLAST is the most common example - iteratively update their position-

specific scoring matrix to successively idenfity more distantly related homologs This

family of methods has been shown to produce a larger number of potential templates and

to identify better templates for sequences that have only distant relationships to any

solved structure Protein threading also known as fold recognition or 3D-1D alignment

can also be used as a search technique for identifying templates to be used in traditional

homology modeling methods When performing a BLAST search a reliable first

approach is to identify hits with a sufficiently low E-value which are considered

sufficiently close in evolution to make a reliable homology model Other factors may tip

the balance in marginal cases for example the template may have a function similar to

that of the query sequence or it may belong to a homologous operon However a

template with a poor E-value should generally not be chosen even if it is the only one

available since it may well have a wrong structure leading to the production of a

misguided model A better approach is to submit the primary sequence to fold-

recognition servers or better still consensus meta-servers which improve upon individual

35

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 36: Project Report on Hepatitis Virus

fold-recognition servers by identifying similarities (consensus) among independent

predictions

Often several candidate template structures are identified by these approaches Although

some methods can generate hybrid models from multiple templates most methods rely

on a single template Therefore choosing the best template from among the candidates is

a key step and can affect the final accuracy of the structure significantly This choice is

guided by several factors such as the similarity of the query and template sequences of

their functions and of the predicted query and observed template secondary structures

Perhaps most importantly the coverage of the aligned regions the fraction of the query

sequence structure that can be predicted from the template and the plausibility of the

resulting model Thus sometimes several homology models are produced for a single

query sequence with the most likely candidate chosen only in the final step

It is possible to use the sequence alignment generated by the database search technique as

the basis for the subsequent model production however more sophisticated approaches

have also been explored

7 Molecular Docking

Introduction to Docking

Docking studies are molecular modeling studies aiming at finding a proper fit between a

ligand and its binding site

There are two classes of protein docking

1)Protein-protein docking

2)Protein Receptor-Ligand

Protein-Protein Docking interactions

Protein-protein interactions occur between two proteins that are similar in size The

interface between the two molecules tend to be flatter and smoother than those in protein-

ligand interactions Protein-protein interactions are usually more rigid the interfaces of

these interactions do not have the ability to alter their conformation in order to improve

36

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 37: Project Report on Hepatitis Virus

binding and ease movement Conformational changes are limited by steric constraint and

thus are said to be rigid

Fig Protein-Protein docking

Protein ReceptorndashLigand docking

Protein receptor-ligand motifs fit together tightly and are often referred to as a lock and

key mechanism There is both high specificity and induced fit within these interfaces with

specificity increasing with rigidity Protein receptor-ligand can either have a rigid ligand

and a flexible receptor or a flexible ligand with a rigid receptor

FigProtein Ligand-Receptor Docking

Rigid Ligand with a Flexible Receptor

The native structure of the rigid ligand flexible receptor often maximizes the interface

area between the molecules They move within respect to one another in a perpendicular

direction in respect to the interface This allows for binding of a receptor with a larger

than usual ligand Normally when there is ligand overlap in the docking interface energy

penalties incur If the van der Waals forces can be decreased energy loss in the system

37

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 38: Project Report on Hepatitis Virus

will be minimilized This can be accomplished by allowing flexibility in the receptor

Flexibility receptors allow for docking of a larger ligand than would be allowed for with

a rigid receptor

Flexible Ligand with a Rigid Receptor

When the fit between the ligand and receptor does not need to be induced the receptor

can retain its rigidity while maintaing the free energy of the system For successful

docking the parameters of the ligand need to be maintained and the ligand must be

slightly smaller in size than that of the receptor interface No docking is completely rigid

though there is intrinsic movement which allows for small conformational adaptation for

ligand binding When the six degrees of freedom for protein movement are taken into

consideration (three rotational three translational) the amount of inherent flexibility

allowed the receptor is even greater This further offsets any energy penalty between the

receptor and ligand allowing for easier more enegetically favorable binding between the

two

Aim of dockingThe aim of docking is to find out the new drugs target it will open new vistas for further

drug development The finding of our docking will be useful in finding a cure for the

infectious disease bird flu also it will open new avenues for finding other possible drug

targets in influenza A virus The docking results can be used to design new lead

compounds and hence can aid in the new drug discovery process

Receptor

A residue on the surface of the cell that serves as a recognition or binding site for

antigensantibody or other cellular or immunological componentsIt is a molecule with in

a cell suface to which a substance (such as harmones or a drug )selectively bind causing

a change in the activity of the cell

LigandThe molecule which binds to a protein molecule (eg receptor) As a ligand binds through

the interaction of many weak noncovalent bonds formed to the binding site of a protein

the tight binding of a ligand depends upon a precise fit to the surface-exposed amino acid

38

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 39: Project Report on Hepatitis Virus

residues on the protein

Active Site

The active site of a proteinenzyme is the region that binds the substrates (and the

cofactor if any) It also contains the residues that directly participate in the making and

breaking of bonds These residues are called the catalytic groups In essence the

interaction of the enzyme and substrate at the active site promotes the formation of the

transition state The active site is the region of the enzyme that most directly lowers the

Delta G of the reaction which results in the rate enhancement characteristic of enzyme

action

Amino acids in protein active sites

It is difficult to generalize which amino acids are likely to be in a protein

activefunctional site as this greatly depends on the type of function With that in mind

below are preferences for the 20 amino acids to lie within functional regions on proteins

These were worked out by considering how often particular amino acids were in contact

with bound non-protein atoms in protein three-dimensional structures Postive values

mean that the amino acid makes more contacts than one would expect by chance

negative values mean that it makes fewer The below does not include protein-protein or

protein-peptide interactions where many of the amino acids with negative values (eg

tryptophan or proline) can play critical roles

His 0360 Tyr -0040 Asp 0045 Gly -0070Trp -0140 Met 0025 Val -0060 Asn 0080Leu -0180 Phe -0120 Gln 0050 Cys 0210

Ile -0005

Ala 0025 Glu 0050 Arg 0055

Pro -0200

Lys 0100 Thr 0100 Ser 0130

RAMACHANDRAN PLOT

A Ramachandran Plot (also known as Ramachandran Map or a Ramachandran diagram )

developed by Gopalasamudram Narayana Ramachandran is a way to visualize dihedral

angles phi against (sai ) of amino acid residues in protein structure It shows the possible

conformation of phi 1048576 and shi angles for a polypeptide In a polypeptide the main chain

N-CZ and CZ- CZ bonds relatively are free to rotate This plot is drawn between torsion

39

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 40: Project Report on Hepatitis Virus

angles phi and psi Ramachandran used computer models of small polypeptides to

systematically vary and with the objective of finding stable conformations For each

conformation the structure was examined for close contacts between atoms Atoms were

treated as hard spheres with

dimensions corresponding to their Vander Waals radii And the angles which cause

spheres to collide

correspond to sterically disallowed conformations of the polypeptide backbone

SAVS (Structure analysis and validation server)

SAVS is a server for analyzing protein structures for validity and assessing how correct

they are Depending on how many programs one select to use the server can take several

minutes to run It also depends on how many residues there are in the protein that is

submitted

PROCHECK

The aim of PROCHECK is to assess how normal or conversely how unusual the

geometry of the residues in a given protein structure is as compared with stereo chemical

parameters derived from well-refined high resolution structure The checks also make

use of lsquoidealrsquo bond lengths and bond angles as derived from a recent and comprehensive

analysis of small molecule structures in the Cambridge Structural Database (CSD)

INPUT

The input to PROCHECK is a single file containing the coordinates of the protein

structure One of the by-products of running PROCHECK is that coordinate file will be

ldquocleaned uprdquo by the first of the programs The cleaning up process corrects any

mislabelled atoms and creates a new coordinates file which has a filendashextension

of new new file will have the atoms labelled in accordance with the IUPAC naming

convention

OUTPUT

40

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 41: Project Report on Hepatitis Virus

The output comprises of the plots together with detailed residue-by-residue listing It

generates number of output files in the default directory which have the same name as the

original PDB file but with different extensions

The residue-by residue listing has a out extension and lists all the computed stereo

chemical properties by residue in a printable ASCII text file

ENERGY MINIMIZATION

Energy is a function of the degree of freedom in a molecule (ie bonds angels and

dihedrals)Energy minimization can repair distorted geometries by moving atoms release

internal constraints Energy minimization is good to release local constraints for a

residue but it will not pass through high energy barriers and stop in a local minima

The potential energy calculated by summing the energies of various interactions is a

numerical value for a single conformation This number can be used to evaluate a

particular conformation but it may not be a useful measure of a conformation because it

can be dominated by a few bad interactions For instance a large molecule with an

excellent conformation fro nearly all atoms can have a large overall energy because of a

single bad interactions for instance two atoms too near each other space and having a

huge Vander wals repulsion energy It is often preferable to carry out energyminimization

on a conformation to find the best nearby conformation Energy minimization isusually

performed by gradient optimization atoms are moved so as to reduce the net forces on

them The minimized structure has small forces on each atom and therefore serves as an

excellent starting point for molecular dynamics simulations

41

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 42: Project Report on Hepatitis Virus

42

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 43: Project Report on Hepatitis Virus

Result and discussion

1Swiss-prot entry -- Protein sequence Glycerate kinase ( HBeAg-binding protein 4)

Entry Information Entry name GLCTK_HUMANPrimary accession numberQ8IVS8

Name and origin of the protein

Protein name Glycerate kinaseSynonyms EC 27131

HBeAg-binding protein 4 Gene name Name GLYCTK

Synonyms HBEBP4ORFNames LP5910

From Homo sapiens (Human) [TaxID 9606] Taxonomy Eukaryota Metazoa Chordata Craniata Vertebrata

Euteleostomi Mammalia Eutheria Euarchontoglires Primates Haplorrhini Catarrhini Hominidae Homo

Protein existence 2 Evidence at transcript level

Blat result-

List of potentially matching sequences-Include query sequenceDb AC Description Score E-value pdb1QGT-C Chain C(Hbcag)Human Hepatitis B Viral Capsid gtgi|5 206 6e-54 pdb 2QIJ-C Chain C Hepatitis B Capsid Protein With An N-Termina 197 2e-51 pdb 2G33-C CAPSD_HBVD1 Chain CHuman T4 Capsid Strain Ad 192 6e-50

43

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 44: Project Report on Hepatitis Virus

pdb 1TA3-B XIP1_WHEAT Chain B Crystal Structure Of Xylanase (Gh10) In

Comp 27 60

pdb 1AW9-A Chain A Structure Of Glutathione S-Transferase Iii I 27 60

Graphical overview of the alignments

Primary structure prediction

By ProtParam

GLCTK_HUMAN (Q8IVS8)

DE Glycerate kinase (EC 27131) (HBeAg-binding protein 4)The computation has been carried out on the complete sequence (523 amino acids)

44

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 45: Project Report on Hepatitis Virus

Number of amino acids 523

Molecular weight 552526

Theoretical pI 625

Amino acid composition Ala (A) 74 141Arg (R) 33 63Asn (N) 11 21Asp (D) 21 40Cys (C) 5 10Gln (Q) 32 61Glu (E) 28 54Gly (G) 51 98His (H) 16 31Ile (I) 15 29Leu (L) 81 155Lys (K) 10 19Met (M) 12 23Phe (F) 10 19Pro (P) 28 54Ser (S) 27 52Thr (T) 22 42Trp (W) 4 08Tyr (Y) 5 10Val (V) 38 73Pyl (O) 0 00Sec (U) 0 00 (B) 0 00 (Z) 0 00 (X) 0 00

Total number of negatively charged residues (Asp + Glu) 49Total number of positively charged residues (Arg + Lys) 43

Atomic composition

Carbon C 2435Hydrogen H 3967

45

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 46: Project Report on Hepatitis Virus

Nitrogen N 711 (41)

Oxygen O 719Sulfur S 17

Formula C2435H3967N711O719S17

Total number of atoms 7849

Extinction coefficients

Extinction coefficients are in units of M-1 cm-1 at 280 nm measured in water

Ext coefficient 29700Abs 01 (=1 gl) 0538 assuming ALL Cys residues appear as half cystines

Ext coefficient 29450Abs 01 (=1 gl) 0533 assuming NO Cys residues appear as half cystines

Secondary structure prediction

By SOPMA result for UNK_158250

View SOPMA in

10 20 30 40 50 60 70 | | | | | | |MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALSLDPGGRQLKVhhhhhhhhhhhccccccceeetcchhhhhhhhhhhhhhhhhhhhhhhhcccthhhhhhhhhcttcceeee (42)

46

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 47: Project Report on Hepatitis Virus

RDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAMERAGKQEMLLKPHSRVQVFEccccccccceeeeeeccchhhhhhhhhhhhhhhhcctteeeeccccccccchttchheeeccccceeeeeGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGSALLPAPIPPVTLEEKQTLTRLLAARGATIQeccccccccchhhhhhhhhhhhhhccttceeeeeetttcceeeeccccccchhhhhhhhhhhhhttcchhELNTIRKALSQLKGGGLAQAAYPAQVVSLILSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALhhhhhhhhhhhhttcchhhhccchhheeeeeeccttccceeeecccccccccchhhhhhhhhhhtcccccPRSVKTVLSRADSDPHGPHTCGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLchhhhhhhhhtcccccccccchhhhheeehcchhhhhhhhhhhhhttcceeeeehhhhtchhhhhhhhhhLAHVARTRLTPSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRGhhhhhhcttcccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhccccceeeeettcceeeeecccccGRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDIATFLAHNDSH

ccchhhhhhhhhhhttccccccceeeeeccccccccccchhhheecthhhhhhhhttcchhhhhhcccccTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPRhhhhhhhttcheeeecccccccchheeeeecctSequence length 523SOPMA Alpha helix (Hh) 235 is 4493 310 helix (Gg) 0 is 000 Pi helix (Ii) 0 is 000 Beta bridge (Bb) 0 is 000 Extended strand (Ee) 80 is 1530 Beta turn (Tt) 36 is 688 Bend region (Ss) 0 is 000 Random coil (Cc) 172 is 3289 Ambigous states () 0 is 000 Other states 0 is 000 (43)

47

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 48: Project Report on Hepatitis Virus

Parameters Window width 17 Similarity threshold 8 Number of states 4

Multiple sequence alignment

ClustalW2 Results

1 Number of sequences 10

2 Alignment score 28565

3 Sequence format Pearson

4 Sequence type aa

5 Output file clustalw2-20080510-09552541output

6 Alignment file clustalw2-20080510-09552541aln

7 Guide tree file clustalw2-20080510-09552541dnd

48

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 49: Project Report on Hepatitis Virus

8 Your input file clustalw2-20080510-09552541input

Scores Table

SeqA Name Len(aa) SeqB Name Len(aa) Score===========================================================================1 Q8IVS8|GLCTK_HUMAN 523 2 Q64896|HBEAG_ASHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 3 P03154|HBEAG_DHBV1 305 3 1 Q8IVS8|GLCTK_HUMAN 523 4 P0C6J9|HBEAG_DHBV3 305 3 1 Q8IVS8|GLCTK_HUMAN 523 5 P03153|HBEAG_GSHV 217 3 1 Q8IVS8|GLCTK_HUMAN 523 6 P0C692|HBEAG_HBVA2 214 3 1 Q8IVS8|GLCTK_HUMAN 523 7 P0C625|HBEAG_HBVA3 214 3 1 Q8IVS8|GLCTK_HUMAN 523 8 P17099|HBEAG_HBVA4 214 3 1 Q8IVS8|GLCTK_HUMAN 523 9 Q81105|HBEAG_HBVA5 214 3 1 Q8IVS8|GLCTK_HUMAN 523 10 Q91C37|HBEAG_HBVA6 214 2 2 Q64896|HBEAG_ASHV 217 3 P03154|HBEAG_DHBV1 305 21 2 Q64896|HBEAG_ASHV 217 4 P0C6J9|HBEAG_DHBV3 305 21 2 Q64896|HBEAG_ASHV 217 5 P03153|HBEAG_GSHV 217 91 2 Q64896|HBEAG_ASHV 217 6 P0C692|HBEAG_HBVA2 214 66 2 Q64896|HBEAG_ASHV 217 7 P0C625|HBEAG_HBVA3 214 65

(45)

2 Q64896|HBEAG_ASHV 217 8 P17099|HBEAG_HBVA4 214 65 2 Q64896|HBEAG_ASHV 217 9 Q81105|HBEAG_HBVA5 214 65 2 Q64896|HBEAG_ASHV 217 10 Q91C37|HBEAG_HBVA6 214 65 3 P03154|HBEAG_DHBV1 305 4 P0C6J9|HBEAG_DHBV3 305 97 3 P03154|HBEAG_DHBV1 305 5 P03153|HBEAG_GSHV 217 24 3 P03154|HBEAG_DHBV1 305 6 P0C692|HBEAG_HBVA2 214 26 3 P03154|HBEAG_DHBV1 305 7 P0C625|HBEAG_HBVA3 214 27 3 P03154|HBEAG_DHBV1 305 8 P17099|HBEAG_HBVA4 214 25 3 P03154|HBEAG_DHBV1 305 9 Q81105|HBEAG_HBVA5 214 26 3 P03154|HBEAG_DHBV1 305 10 Q91C37|HBEAG_HBVA6 214 26 4 P0C6J9|HBEAG_DHBV3 305 5 P03153|HBEAG_GSHV 217 25 4 P0C6J9|HBEAG_DHBV3 305 6 P0C692|HBEAG_HBVA2 214 26 4 P0C6J9|HBEAG_DHBV3 305 7 P0C625|HBEAG_HBVA3 214 27 4 P0C6J9|HBEAG_DHBV3 305 8 P17099|HBEAG_HBVA4 214 25 4 P0C6J9|HBEAG_DHBV3 305 9 Q81105|HBEAG_HBVA5 214 26 4 P0C6J9|HBEAG_DHBV3 305 10 Q91C37|HBEAG_HBVA6 214 26 5 P03153|HBEAG_GSHV 217 6 P0C692|HBEAG_HBVA2 214 70 5 P03153|HBEAG_GSHV 217 7 P0C625|HBEAG_HBVA3 214 69 5 P03153|HBEAG_GSHV 217 8 P17099|HBEAG_HBVA4 214 69 5 P03153|HBEAG_GSHV 217 9 Q81105|HBEAG_HBVA5 214 69 5 P03153|HBEAG_GSHV 217 10 Q91C37|HBEAG_HBVA6 214 69 6 P0C692|HBEAG_HBVA2 214 7 P0C625|HBEAG_HBVA3 214 98 6 P0C692|HBEAG_HBVA2 214 8 P17099|HBEAG_HBVA4 214 98 6 P0C692|HBEAG_HBVA2 214 9 Q81105|HBEAG_HBVA5 214 98

49

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 50: Project Report on Hepatitis Virus

6 P0C692|HBEAG_HBVA2 214 10 Q91C37|HBEAG_HBVA6 214 98 7 P0C625|HBEAG_HBVA3 214 8 P17099|HBEAG_HBVA4 214 98 7 P0C625|HBEAG_HBVA3 214 9 Q81105|HBEAG_HBVA5 214 97

7 P0C625|HBEAG_HBVA3 214 10 Q91C37|HBEAG_HBVA6 214 98 8 P17099|HBEAG_HBVA4 214 9 Q81105|HBEAG_HBVA5 214 97 8 P17099|HBEAG_HBVA4 214 10 Q91C37|HBEAG_HBVA6 214 98 9 Q81105|HBEAG_HBVA5 214 10 Q91C37|HBEAG_HBVA6 214 97 ===========================================================================

Alignment

CLUSTAL 205 multiple sequence alignment

P17099|HBEAG_HBVA4 ------------------------------------------------------------Q91C37|HBEAG_HBVA6 ------------------------------------------------------------P0C692|HBEAG_HBVA2 ------------------------------------------------------------P0C625|HBEAG_HBVA3 ------------------------------------------------------------Q81105|HBEAG_HBVA5 ------------------------------------------------------------Q64896|HBEAG_ASHV ------------------------------------------------------------P03153|HBEAG_GSHV ------------------------------------------------------------P03154|HBEAG_DHBV1 ------------------------------------------------------------P0C6J9|HBEAG_DHBV3 ------------------------------------------------------------Q8IVS8|GLCTK_HUMAN MAAALQVLPRLARAPLHPLLWRGSVARLASSMALAEQARQLFESAVGAVLPGPMLHRALS 60

P17099|HBEAG_HBVA4 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q91C37|HBEAG_HBVA6 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C692|HBEAG_HBVA2 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30P0C625|HBEAG_HBVA3 ----------------------MQLFHLCLIISCT-CPTVQASKLCLGWLWG-------M 30Q81105|HBEAG_HBVA5 ----------------------MQLFHLCLIISCT-CPTFQASKLCLGWLWG-------M 30Q64896|HBEAG_ASHV ----------------------MYLFHLCLVFACVSCPTVQASKLCLGWLWD-------M 31P03153|HBEAG_GSHV ----------------------MYLFHLCLVFACVPCPTVQASKLCLGWLWD-------M 31

50

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 51: Project Report on Hepatitis Virus

P03154|HBEAG_DHBV1 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38P0C6J9|HBEAG_DHBV3 ----------------------MWNLRITPLSFGAACQGIFTSTLLLSCVTVPLVCTIVY 38Q8IVS8|GLCTK_HUMAN LDPGGRQLKVRDRNFQLRQNLYLVGFGKAVLGMAAAAEELLGQHLVQGVISVPKGIRAAM 120

P17099|HBEAG_HBVA4 DIDP------------------------YKEFGATVELLSF------------------- 47Q91C37|HBEAG_HBVA6 DIDP------------------------YKEFGATVELLSF------------------- 47P0C692|HBEAG_HBVA2 DIDP------------------------YKEFGATVELLSF------------------- 47P0C625|HBEAG_HBVA3 DIDP------------------------YKEFGATVELLSF------------------- 47Q81105|HBEAG_HBVA5 DIDP------------------------YKEFGATVELLSF------------------- 47Q64896|HBEAG_ASHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03153|HBEAG_GSHV DIDP------------------------YKEFGSSYQLLNF------------------- 48P03154|HBEAG_DHBV1 DSCL------------------------YMDINASRALANVYD----------------- 57P0C6J9|HBEAG_DHBV3 DSCL------------------------YMDINASRALANVYD----------------- 57Q8IVS8|GLCTK_HUMAN ERAGKQEMLLKPHSRVQVFEGAEDNLPDRDALRAALAIQQLAEGLTADDLLLVLISGGGS 180

P17099|HBEAG_HBVA4 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q91C37|HBEAG_HBVA6 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C692|HBEAG_HBVA2 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85P0C625|HBEAG_HBVA3 --LPSDFFPSVRDLLDTASALYREALES--------------------PEHCSPHHTALR 85Q81105|HBEAG_HBVA5 --LPSDFFPSVRDLXDTASALYREALES--------------------PEHCSPHHTALR 85Q64896|HBEAG_ASHV --LPLDFFPELNALVDTATALYEEELTG--------------------REHCSPHHTAIR 86P03153|HBEAG_GSHV --LPLDFFPDLNALVDTAAALYEEELTG--------------------REHCSPHHTAIR 86P03154|HBEAG_DHBV1 --LPDDFFPKIDDLVRDAKDALEPYWKSDSIK-----------KHVLIATHFVDLIEDFW 104P0C6J9|HBEAG_DHBV3 --LPDDFFPKIDDLVRDAKDALEPYWRSDSIK-----------KHVLIATHFVDLIEDFW 104Q8IVS8|GLCTK_HUMAN ALLPAPIPPVTLEEKQTLTRLLAARGATIQELNTIRKALSQLKGGGLAQAAYPAQVVSLI 240

P17099|HBEAG_HBVA4 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q91C37|HBEAG_HBVA6 ETILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117P0C692|HBEAG_HBVA2 QAILCWGELMTLATWVGNNLQDPASRDLVVNY---------------------------- 117

51

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 52: Project Report on Hepatitis Virus

P0C625|HBEAG_HBVA3 QAILCWGELMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q81105|HBEAG_HBVA5 QAILCWGKLMTLATWVGNNLEDPASRDLVVNY---------------------------- 117Q64896|HBEAG_ASHV QALVCWEELTRLIAWMSANINSEEVRRVIVAH---------------------------- 118P03153|HBEAG_GSHV QALVCWEELTRLITWMSENT-TEEVRRIIVDH---------------------------- 117P03154|HBEAG_DHBV1 QTTQGMHEIAESLRAVIPPTTTPVPPGYLIQHEEAEEIPLGDLFKHQEERIVSFQPDYPI 164P0C6J9|HBEAG_DHBV3 QTTQGMHEIAEALRAVIPPTTTPVPQGYLIQHDEAEEIPLGDLFKHQEERIVSFQPDYPI 164Q8IVS8|GLCTK_HUMAN LSDVVGDPVEVIASGPTVASSHNVQDCLHILNRYGLRAALPRSVKTVLSRADSDPHGPHT 300

P17099|HBEAG_HBVA4 -------------VNTNMGLKIRQLLWFRISYLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q91C37|HBEAG_HBVA6 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C692|HBEAG_HBVA2 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164P0C625|HBEAG_HBVA3 -------------VNTNVGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q81105|HBEAG_HBVA5 -------------VNTNMGLKIRQLLWFHISCLTFGRETVLEYLVSFGVWIRTPPAYRPP 164Q64896|HBEAG_ASHV -------------VNDTWGLKVRQNLWFHLSCLTFGQHTVQEFLVSFGVRIRTPAPYRPP 165P03153|HBEAG_GSHV -------------VNNTWGLKVRQTLWFHLSCLTFGQHTVQEFLVSFGVWIRTPAPYRPP 164P03154|HBEAG_DHBV1 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEAQVTNYISRLRTWLSTPEKYRGR 224P0C6J9|HBEAG_DHBV3 TARIHAHLKAYAKINEESLDRARRLLWWHYNCLLWGEANVTNYISRLRTWLSTPERYRGR 224Q8IVS8|GLCTK_HUMAN CGHVLNVIIGSNVLALAEAQRQAEALGYQAVVLSAAMQGDVKSMAQFYGLLAHVARTRLT 360

P17099|HBEAG_HBVA4 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q91C37|HBEAG_HBVA6 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C692|HBEAG_HBVA2 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195P0C625|HBEAG_HBVA3 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q81105|HBEAG_HBVA5 NAPILSTLPETTVVRRRDRG-----------------------------RSPRRRTPSPR 195Q64896|HBEAG_ASHV NAPILSTLPEHTVIRRRGSARVV--------------------------RSPRRRTPSPR 199P03153|HBEAG_GSHV NAPILSTLPEHTVIRRRGGSRAA--------------------------RSPRRRTPSPR 198P03154|HBEAG_DHBV1 DAPTIEAITRPIQVAQGGRKTTTGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284P0C6J9|HBEAG_DHBV3 DAPTIEAITRPIQVAQGGRKTTSGTRKPRGLEPRRRKVKTTVVYGRRRSKSRERRAPTPQ 284Q8IVS8|GLCTK_HUMAN PSMAGASVEEDAQLHELAAELQIPDLQLEEALETMAWGRGPVCLLAGGEPTVQLQGSGRG 420

52

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 53: Project Report on Hepatitis Virus

P17099|HBEAG_HBVA4 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q91C37|HBEAG_HBVA6 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C692|HBEAG_HBVA2 RRRSQSPRRRRSQSRESQC----------------------------------------- 214P0C625|HBEAG_HBVA3 RRRSPSPRRRRSQSRESQC----------------------------------------- 214Q81105|HBEAG_HBVA5 RRRSQSPRRRRSQSRESQC----------------------------------------- 214Q64896|HBEAG_ASHV RRRSQSPRRR-PQSPASNC----------------------------------------- 217P03153|HBEAG_GSHV RRRSQSPRRRRSQSPASNC----------------------------------------- 217P03154|HBEAG_DHBV1 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305P0C6J9|HBEAG_DHBV3 RAGSPLPRSSSSHHRSPSPRK--------------------------------------- 305Q8IVS8|GLCTK_HUMAN GRNQELALRVGAELRRWPLGPIDVLFLSGGTDGQDGPTEAAGAWVTPELASQAAAEGLDI 480

P17099|HBEAG_HBVA4 -------------------------------------------Q91C37|HBEAG_HBVA6 -------------------------------------------P0C692|HBEAG_HBVA2 -------------------------------------------P0C625|HBEAG_HBVA3 -------------------------------------------Q81105|HBEAG_HBVA5 -------------------------------------------Q64896|HBEAG_ASHV -------------------------------------------P03153|HBEAG_GSHV -------------------------------------------P03154|HBEAG_DHBV1 -------------------------------------------P0C6J9|HBEAG_DHBV3 -------------------------------------------Q8IVS8|GLCTK_HUMAN ATFLAHNDSHTFFCCLQGGAHLLHTGMTGTNVMDTHLLFLRPR 523

Guide Tree((((((Q8IVS8|GLCTK_HUMAN059519(P03154|HBEAG_DHBV1001176P0C6J9|HBEAG_DHBV3001119)036054)021341(Q64896|HBEAG_ASHV005849P03153|HBEAG_GSHV002446)012844)014364Q81105|HBEAG_HBVA5001168)000175P0C625|HBEAG_HBVA3000818)

000110P0C692|HBEAG_HBVA2000445)

53

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 54: Project Report on Hepatitis Virus

000022P17099|HBEAG_HBVA4000942Q91C37|HBEAG_HBVA6000927)

Phylogram

Tertiary structure prediction

pdb 1QGT-C was selected as template which showed around 856 identity with target sequence and the template structure was downloaded from the PDB

Swiss-PdbViewer was launched and the following procedure was carried out

Steps involved in SPDBV

open the template structure from file (pdb file) choose icon - Swiss model-load the raw target sequencersquo

choose icon -fit-fit raw sequence then magic fit then iterative fit

choose icon -file - save-layer(pdb)

choose icon -file - save-project(pdb)

choose icon - Swiss model-submit modeling request(A new browser will be opened loading the pdb file and give the Email ID for receiving the modeled structure)

open the new structure (received from Email) - remove the template-by selecting the target

choose icon -file - save-layer(pdb)

54

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 55: Project Report on Hepatitis Virus

Open Swiss model and select load raw sequence option to load target molecule

Perform magic fit iterative fit provided under FIT in order to fit the two sequences

55

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 56: Project Report on Hepatitis Virus

Save the file as the project

Select ldquosubmit modeling requestrdquo under Swiss model to submit it for modeling

Homologous modeling

Optimise Mode Request submission form

Please fill these fields

56

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 57: Project Report on Hepatitis Virus

Your Email address Gunjan300gmailcom (MUST be correct)

Your Name Gunjan

Request title Gunjan project Will be added to the results header

Your SWISS-MODEL project file can be found in

CDocuments and SettingsuserDesktopproj_kumarpdb

Workunit P000044 TitleQ8IVS8

SWISS MODEL WORKSPACE

Model information

modelled residue range 83 to 514

based on template 2b8nA (253 Aring)

Sequence Identity [] 34

Evalue 270e-52

57

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 58: Project Report on Hepatitis Virus

click on model bars

Fig structure of template after modeling

Alignment TARGET 83 LV GFGKAVLGMA AAAEELLGQH2b8nA 4 peslkklaie ivkksieavf pdravk--et lpklnldrvi lvavgkaawr TARGET hh sss sssss hhh2b8nA hhhhhhhh hhhhhhh hhhhhh hh sss sssss hhh

TARGET 105 LVQGVISVPK GIRAAMERAG KQEMLLKPHS RVQVFEGAED NLPDRDALRA2b8nA 53 xakaayevlg kkirkgvvvt kyghsegpid dfeiyeagh- pvpdentikt TARGET hhhhhhhhh ssssss sssss hhhhh2b8nA hhhhhhhhh ssssss sssss hhhhh

TARGET 155 ALAIQQLAEG LTADDLLLVL ISGGGSALLP APIPPVTLEE KQTLTRLLAA2b8nA 101 trrvlelvdq lnendtvlfl lsgggsslfe lplegvslee iqkltsallk TARGET hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh2b8nA hhhhhhhh ssssss ss hhhh sss hhh hhhhhhhhhh

TARGET 205 RGATIQELNT IRKALSQLKG GGLAQAAYPA QVVSLILSDV VGDPVEVIAS

2b8nA 151 sgasieeint vrkhlsqvkg grfaervfpa kvvalvlsdv lgdrldvias TARGET h hhhhhh hhh sss hhhhhh sssssss 2b8nA h hhhhhh hhh sss hhhhhh sssssss

58

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 59: Project Report on Hepatitis Virus

TARGET 255 GPTVASSHNV QDCLHILNRY GLRAALPRSV KTVLSRADSD PHGPHTCGHV2b8nA 201 gpawpdssts edalkvleky giets--esv krailqetpk hls-----nv TARGET h hhhhhhhhhh hh hhhh sss sssss2b8nA h hhhhhhhhhh hhh hhhh ss

TARGET 305 LNVIIGSNVL ALAEAQRQAE ALGYQAVVLS AAMQGDVKSM AQFYGLLAHV2b8nA 244 eihlignvqk vcdeakslak ekgfnaeiit tsldcearea grfiasixke TARGET sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

2b8nA sssss hh hhhhhhhhhh h sssss sss hhhh hhhhhhhhhh

TARGET 355 ARTRLTPSMA GASVEEDAQL HELAAELQIP DLQLEEALET MAWGRGPVCL2b8nA 294 vkfkdrplkk paalifgget vvhvkgngig grnqelalsa aialegiegv TARGET hhh ssssssssss s hhhhhhhhh hhh ss2b8nA hhh ssssssssss s hhhhhhhhh hhhh ss

TARGET 405 LAGGEPTVQL QGSGRGGRNQ ELALRVGAEL RRWPLGPIDV LFLSGGTDGQ2b8nA 344 ilcsagtdgt dgptdaaggi vdgstaktlk axgedpyqyl knndsynalk TARGET sssssss sss s hhhhhhh h hhhh hhhhhhh2b8nA sssssss ssss s hhhhhhh hh hhhh hh hhhhh

TARGET 455 DGPTEAAGAW VTPELASQAA AEGL 2b8nA 394 ksgallitgp tgtnvndlii gliv- TARGET h sss sssss ssss 2b8nA sss sssss ssss

Model Validation

INTRODUCTION

Structure Analysis and Validation Server greatly simplifies computational analysis of the molecular structure and sequence of proteins The stereochemical validation of model structures of proteins is an important part of the comparative molecular modeling process Ramachandran plot is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure It shows the possible conformations of φ and ψ angles for a polypeptide The Ramachandran plot displays the psi and phi backbone conformational angles for each residue in a protein The distance between two succession alpha carbon atoms in the backbone chain and the angles between the two bonds of such atoms in desired protein can be determined using this plot Software

59

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 60: Project Report on Hepatitis Virus

SAVS httpnihservermbiuclaeduSAVSProcedureThe target protein structure obtained after homology modeling using deep view and modeler is given as input for SAVS

SAVES results for proj_gunjanpdb

Procheck summary

RAMCHANDRAN POLT

60

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 61: Project Report on Hepatitis Virus

Result ----- Plot statistics SCORE ageResidues in most favoured regions [ABL] 990 856Residues in additional allowed regions [ablp] 104 90Residues in generously allowed regions [~a~b~l~p] 11 10Residues in disallowed regions 51 44 ---- ---- ------------------Number of non-glycine and non-proline residues 1156 1000Number of end-residues (excl Gly and Pro) 8Number of glycine residues (shown as triangles) 127 (59)

Number of proline residues 60 ----Total number of residues 1351

61

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 62: Project Report on Hepatitis Virus

Based on an analysis of 118 structures oand R-factor no greater than 20 a good quality model would be expectedto have over 90 in the most favoured regions

Docking result ---- by Hex software

FigLigand amp Receptor (2B8N)

62

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 63: Project Report on Hepatitis Virus

Fig after docking

DoHex 50 starting at Fri May 16 091619 2008 on host WORK-A7E7353059

Running HEX_STARTUP file CProgram FilesHex 50datastartup_v5macDisc Cache enabled Using directory CProgram FilesHex 50cacheAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYS Warning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYS

63

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 64: Project Report on Hepatitis Virus

Warning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025

64

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 65: Project Report on Hepatitis Virus

Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005

LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050

65

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 66: Project Report on Hepatitis Virus

MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

66

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 67: Project Report on Hepatitis Virus

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bondsgt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGI

67

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 68: Project Report on Hepatitis Virus

VDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVAssuming CProgram FilesHex 50examples2B8Npdb is a PDB file

Opened PDB file CProgram FilesHex 50examples2B8Npdb ID = 2B8NWarning Cant add all hydrogens to incomplete residue A 253LYSWarning Cant add all hydrogens to incomplete residue A 316HISWarning Cant add all hydrogens to incomplete residue A 318LYSWarning Cant add all hydrogens to incomplete residue A 393LYSWarning Cant add all hydrogens to incomplete residue B 8LYSWarning Cant add all hydrogens to incomplete residue B 9LYSWarning Cant add all hydrogens to incomplete residue B 17LYSWarning Cant add all hydrogens to incomplete residue B 34LYSWarning Cant add all hydrogens to incomplete residue B 36ASNWarning Cant add all hydrogens to incomplete residue B 62LYSWarning Cant add all hydrogens to incomplete residue B 65ARGWarning Cant add all hydrogens to incomplete residue B 66LYSWarning Cant add all hydrogens to incomplete residue B 316HISWarning Cant add all hydrogens to incomplete residue B 370LYSWarning Cant add all hydrogens to incomplete residue B 380TYRWarning Cant add all hydrogens to incomplete residue B 404THRPDB structure has crystal symmetry elementsPDB structure has biological symmetry elementsLoaded PDB file CProgram FilesHex 50examples2B8Npdb (927 residues 7597 atoms 1 models)Warning Fractional charge (035) for non-terminal residue A 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (041) for non-terminal residue A 82ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025 ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPCG Radius = 140 Charge = 062 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 291MSE MSEN Radius = 140 Charge = -052

68

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 69: Project Report on Hepatitis Virus

MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue A 318LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue A 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 8LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050

LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 9LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005

69

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 70: Project Report on Hepatitis Virus

LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 17LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 52MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (012) for non-terminal residue B 62LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 66LYS LYSN Radius = 140 Charge = -052

LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (-021) for non-terminal residue B 81ASP ASPN Radius = 140 Charge = -052 ASPCA Radius = 150 Charge = 025

70

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 71: Project Report on Hepatitis Virus

ASPC Radius = 140 Charge = 053 ASPO Radius = 150 Charge = -050 ASPCB Radius = 170 Charge = -021 ASPH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 291MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001 MSEH Radius = 000 Charge = 025Warning Fractional charge (034) for non-terminal residue B 370LYS LYSN Radius = 140 Charge = -052 LYSCA Radius = 150 Charge = 023 LYSC Radius = 140 Charge = 053 LYSO Radius = 150 Charge = -050 LYSCB Radius = 170 Charge = 004 LYSCG Radius = 170 Charge = 005 LYSCD Radius = 170 Charge = 005 LYSCE Radius = 170 Charge = 022 LYSH Radius = 000 Charge = 025Warning Fractional charge (035) for non-terminal residue B 375MSE MSEN Radius = 140 Charge = -052 MSECA Radius = 150 Charge = 014 MSEC Radius = 140 Charge = 053 MSEO Radius = 150 Charge = -050 MSECB Radius = 170 Charge = 004 MSECG Radius = 170 Charge = 009 MSESE Radius = 190 Charge = 032 MSECE Radius = 190 Charge = 001

MSEH Radius = 000 Charge = 025Warning Fractional charge (023) for non-terminal residue B 404THR THRN Radius = 140 Charge = -052 THRCA Radius = 150 Charge = 027 THRC Radius = 140 Charge = 053 THRO Radius = 150 Charge = -050 THRCB Radius = 150 Charge = 021 THRH Radius = 000 Charge = 025Counted 104 +ve and 114 -ve formal charged residues Net formal charge -10Warning Using PDB CONECT records to define non-standard bonds

71

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 72: Project Report on Hepatitis Virus

gt2B8N APESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVgt2B8N BPESLKKLAIEIVKKSIEAVFPDRAVKETLPKLNLDRVILVAVGKAAWRMAKAAYEVLGKKIRKGVVVTKYGHSEGPIDDFEIYEAGHPVPDENTIKTTRRVLELVDQLNENDTVLFLLSGGGSSLFELPLEGVSLEEIQKLTSALLKSGASIEEINTVRKHLSQVKGGRFAERVFPAKVVALVLSDVLGDRLDVIASGPAWPDSSTSEDALKVLEKYGIETSESVKRAILQETPKHLSNVEIHLIGNVQKVCDEAKSLAKEKGFNAEIITTSLDCEAREAGRFIASIMKEVKFKDRPLKKPAALIFGGETVVHVKGNGIGGRNQELALSAAIALEGIEGVILCSAGTDGTDGPTDAAGGIVDGSTAKTLKAMGEDPYQYLKNNDSYNALKKSGALLITGPTGTNVNDLIIGLIVFound 223 MB main memory setting N_MAX=33Check threefold = 0Docking search mode = 6D rotation + translation (optimal)

Using intermolecular distance R12 = 000 rounded to 000Setting distance range = 000 to 1950 with steps of 075Calculating surface skins Grid = 060A

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140AGaussian sampling over 6149 atoms done in 286 seconds Contoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 434 seconds[716573496115915533068511]Surface traversal done in 023 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176Culled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Contouring surface for molecule 2B8NPolar probe = 140A Apolar probe = 140A

72

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 73: Project Report on Hepatitis Virus

Gaussian sampling over 6149 atoms done in 281 secondsContoured 338888 triangles (169444 vertices) in 130 secondsCulled 128559 short edges in 6 cycles in 436 seconds[716573496115915533068511]Surface traversal done in 022 seconds - Found 1 surface segmentsPrimary surface Area = 2635096 Volume = 15792176vm 5000 MBCulled 0 small segments in 027 secondsCulling reduced surface complexity by 75 per cent (81770 triangles 40885 vertices)Total contouring time 614 seconds

Sampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 234 secondsSampling surface and interior volumes for molecule 2B8NGenerated 201019 exterior and 216848 interior skin grid cellsExterior skin volume = 4342010 interior skin volume = 4683917Volume sampling done in 136 seconds

Calculating skin coefficients to N = 25Integration applied to 417867 cells 464 per cent of the total grid volumeSkin integration to N = 25 done in 4395 seconds

Docking will output a maximum of 500 solutions per pair

------------------------------------------------------------------------------Docking 1 pair of starting orientations

Docking receptor 2B8N and ligand 2B8N

Receptor 2B8N Tag = 2B8N

Ligand 2B8N Tag = 2B8NWorking buffer for 1000000 orientations (27Mb)

Total 6D space Iterate[278121] x FFT[642448] = 1616412672Initial rotational increments (N=16) Receptor 812 (19Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 091 seconds

Starting 3D FFT N=16Using Kiss FFT for multi-dimensional DFTs3D FFT setup 000 s 66 Mb memoryEstart = 8621255 KJmol (Eshape=8621255 Eforce=000)

73

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 74: Project Report on Hepatitis Virus

R = 000R = 075R = 150R = 225R = 300R = 375R = 450R = 525R = 600R = 675R = 750R = 825R = 900R = 975R = 1050R = 1125R = 1200R = 1275R = 1350R = 1425R = 1500R = 1575R = 1650R = 1725R = 1800R = 1875R = 1950Hex 5366 s GF 7301 s FFT 27787 s Scan 1545 s FFT Rate 5817091sEstart = 8621255 -gt rank 1

3D search found 01616412672 within threshold but NOT including start guessDone 21924 3D FFTs for 1616412672 orientations in 7 min 0 sec (3848574s) Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Top 1 orientations -gt 3 after distance sub-samplingWorking buffer for 3 orientations (1Mb)Surviving rotational steps (N=25) Receptor 1 (1Mb) Ligand 1 (1Mb)Loading all coefficient vectors into memoryCoefficient rotations done in 000 seconds

Starting docking search with N=25 Nalpha=6464Estart = 14350136 KJmol (Eshape=14350136 Eforce=000)R = 000R = 040

74

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 75: Project Report on Hepatitis Virus

R = 075Estart = 14350136 -gt rank 5

Main pass found 0 minima within threshold but NOT including start guessMain pass done in 0 min 0 sec (1761s)

Starting orientation [alpha=0] (Energy=14350136) ranked 5 in the search

Docked structures 2B8N2B8N in a total of 7 min 5 sec

Best start orientation [alpha=0] (Energy=000) is at 11Energy range Emin = 000 Emax = 000

Docking correlation summary by RMS deviation and steric clashes------------------------------------------------------------------------- Soln Etotal Eshape Eforce Eair RMS Bumps ---- --------- --------- --------- --------- ---------------- -----

------------------------------------------------------------------------------Saving top 500 orientations

Docking done in a total of 8 min 11 sec

------------------------------------------------------------------------------

No AIRs enabled or defined Skipping restraint checksClustering found 1 clusters from 1 docking solutions in 000 seconds

---- ---- ------- ------- ------- ------- ------- ------ --- -----Clst Soln Models Etotal Eshape Eforce Eair Vshape Vclash Bmp RMS---- ---- ------- ------- ------- ------- ------- ------- ------ --- ----- 1 1 000000 00 00 00 00 00 00 -1 -100--------------------------------------------------------------------------- 1 1 000000 00 00 00 00 00 00 -1 -100

75

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 76: Project Report on Hepatitis Virus

Conclusion

76

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 77: Project Report on Hepatitis Virus

After analyzing protein sequence of Hepatitis B virus we come to conclusion that though they all

are closely related they have an important role in survival in different species It is interesting to

have closer look at the matter by studying at the gene level A phylogenetic analysis can be very

helpful in understanding the evolutionary pattern

We have noticed that same genes are present in all strains this shows that are they

evolved together

With the finishing of the ongoing gene sequencing project on HBV we

hope it will be possible to draw conclusive decision about the true picture of evolution in near

future and gene responsible for pathogenesis can also be identified

Complete inference can only be drawn based on a comprehensive list

of the gene products and their function

In order to find out unknown structure of protein present in the

different species we do homology modelling We forward step to present a theoretical model

using available online modelling tools

As we study that HBeAG (Glycerate kinase ) protein that is coded by

gene is one of the second reasons of pathogenicity of HBV So we tried to dock this protein with

appropriate ligand in order to inhibit their activity on the basis of which the drugs have to be

developed

77

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 78: Project Report on Hepatitis Virus

Future prospects

The work presented in this report might just be a stepping stone for any such discoveries The

present work might be small finding of big issue

78

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 79: Project Report on Hepatitis Virus

Phylogenetics is that field of biology which deals with identifying and understanding the

relationships between the many different kinds of life on earth This includes methods for

collecting and analysing data as well as interpretation of those results as new biological

information

The purpose of modelling is to help the Drug developers and Biotechnologists to develop the

drug more efficiently and with more effectiveness in future by analysing the modelled structure

of protein

As the new drugs target would be identified it will open new vistas for further drug

development The finding of our docking will be useful in finding a cure for the infectious disease

bird flu also it will open new avenues for finding other possible drug targets in influenza A virus

The docking results can be used to design new lead compounds and hence can aid in the new drug discovery process

Finally similar process can be applied on other pathogens and hence possible therapeutic sites

can be identified in them Similar method can also be applied to other infectious diseases and

hence we can look forward to a better disease free world

The work presented is just a small part of big issue and lots of work still needs to be done to

establish a good phylogenetic relationship and full fledged cure for bird flu But we are hoping

that these findings will go long way and will prove fruitful to any going in a similar area

79

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 80: Project Report on Hepatitis Virus

BIBLIOGRAPHY

[1] - Lannsing M PrescottJohn P Harley and Donald A Klein Microbiology 6th edition McGrawHill Higher EducationHuman diseases caused by viruses

80

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 81: Project Report on Hepatitis Virus

[2] - F V Chisari C Ferrari

Department of Molecular and Experimental Medicine Scripps Research Institute La

Jolla California 92037 USA

[3] -C Seeger W S Mason

Fox Chase Cancer Center Philadelphia Pennsylvania 19111 USA c_seegerfcccedu

[4]- plumbed

[5]- Howard Hughes Medical Institute Department of Biochemistry and Molecular

Biophysics Columbia University New York New York 10032 USA

Reprint requests to Barry Honig Howard Hughes Medical Institute Department of

Biochemistry and Molecular Biophysics Columbia University New York NY 10032

USA

[6]- Al-Lazikani B Sheinerman FB and Honig B 2001 Combining multiple

structure and sequence alignments to improve sequence detection and alignment

Application to the SH2 domains of Janus kinases Proc Natl Acad Sci 98 14796ndash14801 [PubMed]

Aloy P Querol E Aviles FX and Sternberg MJ 2001 Automated structure-based

prediction of functional sites in proteins Applications to assessing the validity of

inheriting protein function from homology in genome annotation and to protein docking

J Mol Biol 311 395ndash408 [PubMed]

(77)

Altschul SF Madden TL Schaffer AA Zhang J Zhang Z Miller W and

Lipman DJ 1997 Gapped BLAST and PSI-BLAST A new generation of protein

database search programs Nucleic Acids Res 25 3389ndash3402 [PubMed]

81

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 82: Project Report on Hepatitis Virus

Apweiler R Attwood TK Bairoch A Bateman A Birney E Biswas M Bucher

P Cerutti L Corpet F Croning MD et al 2000 InterPromdashAn integrated

documentation resource for protein families domains and functional sites Bioinformatics

[7]- Chemogenomics Laboratory Research Group on Biomedical Informatics Institut

Municipal Investigacioacute Medica and Universitat Pompeu Fabra Passeig Maritim de la

Barceloneta 37-49 08003 Barcelona (Catalonia) Spain

[8]- Computational Sciences Department of Chemistry Nerviano Medical Sciences

Viale Pasteur 10 20014 Nerviano (MI) Italy romanokroemersanofi-aventiscom

82

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 83: Project Report on Hepatitis Virus

Abbreviation

83

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 84: Project Report on Hepatitis Virus

CSA Catalytic Site Atlas

Emboss European Molecular Biology Open Software Suit

NCBI National Centre for Biotechnology Information

NDB Nucleic Acid Database

ORF Open Reading Frame

OTU Operational Taxonomic Unit

PDB Protein Data Bank

Phylip Phylogeny Inference Package

84

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 85: Project Report on Hepatitis Virus

85

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion
Page 86: Project Report on Hepatitis Virus

86

  • Prevention
  • Using ProtParam-for primary structure
  • PROTOCOL FOLLOWED
    • Template selection and sequence alignment
      • Introduction to Docking
        • By ProtParam
          • GLCTK_HUMAN (Q8IVS8)
          • By SOPMA result for UNK_158250
          • Scores Table
          • Alignment
          • Guide Tree
          • Phylogram
          • Procheck summary
            • Conclusion