Rational Drug Discovery PC session Protein sequence analysis
Biocomputing Primary etc structure X-ray crystallography Structural
genomics Homology modelling Protein Structure QSAR History
Objectives Limitations Statistics Steric Electrostatics Hydrophobic
PC sessions Molecular modelling Theory Drug structure Drug
conformation Docking De Novo ligand design PC sessions 3D QSAR
CoMFA Lead compound Physiological Biochemical Chemical (prodrugs)
Targeting and delivery
Slide 4
Identify new proteins - that could be drug potential targets -
especially for GPCRs Database query - give me all adrenergic
receptor sequences 10 rat sequences 7 human sequences - conclusion?
Understand overall function of newly identified protein A protein
shows some similarity to another well understood protein -
conclusion? Identify basic structural features A protein contains 7
hydrophobic stretches of ~26 amino acids - conclusion? A protein
contains 12 hydrophobic stretches of ~26 amino acids - conclusion?
Identify the important residues in a protein All class A amine like
G-protein coupled receptors (e.g. adrenergic, serotonin (5HT),
dopamine, histamine, muscarinic) contain a conserved D (aspartate,
Asp) on helix 3 that is involved in binding all known drugs At some
sequence positions there are key differences between similar
receptors that can be exploited to design subtype-specific drugs.
Sequence alignment can be used in homology modeling Build a
structural model of a protein from its sequence alignment with a
protein of known structure 7.1. Sequence Analysis - why? 3 more to
be found The new protein may have a similar function It is probably
a G-protein coupled receptor It is probably a transporter
7.3. Identity and similarity Align 2 sequences ADGVLIIQVG &
ADGVLIQVG 2 alternatives ADGVLIIQVG |||||| or |||||| ||| ADGVLIQVG
ADGVLI-QVG Score Comparing sub-sequences of A (400 residues), and B
(650 residues) 6 9 = higher, so better alignment (I)A (I)B (ii)A
(ii)B If A and B are identical in the regions that match then
alignment is straightforward even if it is necessary to insert gaps
generally the subsequences are not identical so and so we need a
measure of similarity rather than identity gap
Slide 7
2.4. G-protein coupled receptors GG AC Cytosol Exterior
Stimulatory ligand Plasma membrane Inhibitory Ligand Receptor (Gs
coupled) G GDP Stimulation Rhodopsin, X-ray structure
Slide 8
Same sequence different organisms, different sequences same
organism note different lengths - Note poor alignment at start (
40), including well- conserved N at position 57( a well-known GPCR
motif) 2.4 GPCR CXCR4 Chemokine N-terminal (start)_sequences
Slide 9
2.4 GPCR alignment : helices 6 & 7
Slide 10
2.4 Notes on previous alignment Note examples of different
sequence, same organism Note well-conserved (largely green) helical
regions (~185-210, 225-247) Note less well-conserved loop region
(~215) between transmembrane helix 6 (TM6) and TM7 Find conserved
CWXP motif and NPXXY motif CWLP is at position NPXXY is at position
Are the alternatives to C (position 199) and N (position 241) what
you would expect from the amino acid structure (see below)? The
identification of such Motifs is an indication that a new sequence
is a GPCR Can you see groups of sequences that more similar to each
other if these are highly similar subtypes of the same receptor
(e.g. Neurokinin receptor subtype 1 (NK1R), NK2R and NK3R) it could
be difficult to design a drug to bind to one and not the other.
Note predominance of green hydrophobic residues in transmembrane
regions (roughly positions 198-210 (TM6) and 222-248 (TM7) and
red/blue hydrophilic residues in the loops (~211-221) and ~249+.
For the full colour code examine the alignment itself!. 199 241
Yes, the alternatives are similar
Slide 11
2.4. Sequence alignment and subtype-specificity This position
is N in beta-adrenergic receptors and F in alpha adrenergic
receptors. We know from SDM and structure that it is in the binding
site Beta-selctive ligands such as propranolol have on OH group to
interact with this; alpha adrenergic ligands are more hydrophobic
at this point. 5HT receptors also have this N at this position and
so promiscuously bind propranolol. Knowledge of sequence can
therefore be used to design specificity and reduce
side-effects.
Slide 12
3.4. What does an alignment mean? From Homstrad database,
superposition of 1oft and 1bip - 4-oxalocrotonate tautomerase from
Pseudomonas sp and Pseudomonas putida, 60 residues, %ID = 76% Gap
red chain longer 1tig, 2ife, translation initiation factor if-3
from Bacillus stearothermophilus and Escherichia coli At position
6, 1oft has a Y and 1bip has an H.
Slide 13
3.4. What does an alignment mean? The gap here is because the
blue loop is longer than the red loop at this point 2mbr and 1hsk,
Diphospho-N- acetylenolpyruvylgluco samine reductase and UDP-N-
acetylenolpyruvoylgluc osamine reductase from Escherichia coli and
Staphylococcus aureus
Slide 14
Align the sequnces using The Dotplot 7.6. Pairwise alignment:
the Dotplot
Slide 15
Dotplot unrelated sequences These sequences: ASRAILFYLLLIDD and
HLWDSAGGQNSTSP are not related. There is no serious diagonal line.
There will inevitably some dots there are only 20 amino acids. A
dot does not mean an alignment with 1 identical residue Is there a
weak alignment in the following? ASRAILFYLLLIDD---------
---------HLWDSAGGQNSTSP Probably not, even this looks like it has
arisen by chance
Slide 16
Alignments from dotplots simple cases The following dotplot has
been determined note the diagonal lines Consider whether the short
diagonal regions can be extended The alignment is therefore
HIWDSGGAQQSSSD |:|||:|:|:|:| HLWDSAGGQNSTSP The %ID = 8*100/14 This
can only be worked out from the alignment It cannot be worked out
from the dotplot Note that in this case, some of the non- identical
amino acids, e.g. {I,L}, {G,A} are very similar hence the : symbol.
The D and the P at the end are not at all similar but the they
should not be missed out
Slide 17
Dotplots - continued Alignments do not always start in the top
left hand corner The alignment is therefore YLHIWDSGGAQQSSSDD
|:|||:|:|:|:| --HLWDSAGGQNSTSP- The %ID = 8*100/14 =57% (based on 2
nd sequence, or 8*100/17 =47% based on first
Slide 18
Dotplots: alignments with gaps This alignment shows two
diagonal lines, with two clear local alignments: HLWDSA AGAQQSTS
|||||| and ||:|:||| HLWDSA AGGQNSTS Joining these together gives
HLWDSAFFAGAQQSTS |||||| |:|:||| or ||||| ||:|:||| HLWDSA---GGQNSTS
HLWDS---AGGQNSTS We have to decide as we cant use the A twice, so I
chose 1 st you might choose 2nd %ID = 11*100/16=69%
Slide 19
7.6. For you to align using a dot plot D4DR_HUMAN RERKAMRVLP
VVVGAFLLCW TPFFVVHITQ ACM1_HUMAN KEKKAARTLS AILLAFILTW TPYNIMVLVS
Hint: you need some squared paper! The correct answer is obvious -
but you need to do the exercise so you can check out the
alternatives The correct answer can be found at
http://tinyGRAP.uit.no/famin.html - the sequences are part of helix
6 (last checked 2001). 20
Slide 20
7.7. Pairwise alignment: Completed Dotplot Different but
related Identical sequences Highly similar The alignment is
EGPRPDSSAGGSSAG |||:|||||| EGPKPDSSAG or EGPRPDSSAGGSSAG |||:||
|||| EGPKPD-----SSAG or? gap C-terminus %ID = 9*100/10 9 matches
over a length of 10 residues %ID = 9*100/10 9 matches over a length
of 15 residues
Slide 21
7.8. Global alignment v local alignment Global alignment The
essence is to score 1 for each X on the dot plot, 0 otherwise. The
aim is to find the highest scoring route (from the alternatives)
through the entire grid starting from the C-terminus - essentially
by joining up diagonal lines in the dotplot. A gap penalty is
introduced for jumping between parallel lines as this corresponds
to creating a gap. The Needleman and Wunsch algorithm is the best
known of this kind. Local alignment Similar to the above but only
fragments are considered. Only parts of the protein may be
similar.
Slide 22
7.9. Database searching In database searching we effectively
carry out lots of pairwise comparisons - but this has to be much
faster than an ordinary pairwise alignment. Fasta searches for
identical pairs of ~2 residues - with tricks to find the best way
to join the pairs together. An alignment will be produced if enough
pairs are found. Output from the program includes query sequence -
the one entered name of database searched (e.g. SWISS-PROT) program
name + literature reference to be cited list of hits (often ~50),
incl. unique database identifier (e.g. A1AA_RAT) & ID code
(e.g. P23944) E-value - a low value indicates that virtually no
matches with a similar score could expected by chance Look for a
value less than 0.01 or preferably 0.001 alignment BLAST The
distinction is that BLAST looks for fixed length hits and extends
them if possible. The resulting high scoring pairs (HSPs) form the
basis of the alignment.
Slide 23
HA +- + S + AC H S S S S A H H - - - Ssmall+positiveC cysteine
Aaromatic-negative or similar polar Other groupings possible Gly, G
Val, V Tyr, Y Arg, R Asp, D Cys, C Ala, A Trp, W Lys, K Glu, E Met,
M Ile, I Phe, F Ser, S Asn, N Pro, P Leu, L His, H Thr, T Gln, Q
7.10. 5 Amino acid groups - arrange in groups
Slide 24
7.11. Similarity Above left - identity matrix - as used in
dotplot Above right - part of Dayhoff mutation matrix - based on
observed mutations in aligned proteins. W is rarer than L and so
matches score 17 rather than 6 F is like Y so a match still scores
7 W and V are very different hence - 6 30
Slide 25
7.12. Multiple sequence alignment Two main perspectives 1st -
based on comparison of amino acid sequences, taking into account
amino acid properties 2nd - takes into account secondary or
tertiary structure Which is the best alignment below?HHHHHH HHHHH
EGPRPDSSAGGSSAGAPD |||:|.|||||||:|. |||| EGPKPQSSAG-----APD
EGPKPQ-----SSAGAPD General strategy Pair-wise alignment of all
sequences Produce a phylogenetic tree to group similar sequences
(as right) Similar sequences aligned first, more distantly related
later Gaps in related sequence guides position of gaps in others
The alignment may not be optimal and may need manual adjustment A
similarity matrix (e.g. Dayhoff PAM 250, BLOSUM 60) rather than an
identity matrix used in alignment Different methods (e.g. clustal
(ordinary method), T-coffee, profile methods in clustal) may give
different alignments so think carefully about an alignment The
first creates gaps in secondary structure (not so good) - second is
better (H denotes helix)
Slide 26
7.13. Profile methods in multiple sequence alignment Consensus
sequence In multiple sequence alignment the consensus sequence
gives the usual amino acid at a particular position: Shown as upper
case if only one amino acid present, e.g. A at position 9 lower
case if majority are one amino acid, e.g. y at position 1 If equal
numbers, show all residues present, e.g. V/L at position 6 Profile
Percentage of each amino acid at each point At position 1, 3/5 Y
and 2/5 F so profile is 0.6Y, 0.4F y d g G A/I V/L v e A t 0.6Y
0.6D 0.8G 1.0G 0.4A 0.4V 0.6V 0.6E 1.0A 0.2V 0.4F 0.4E 0.2- 0.4I
0.4L 0.4- 0.4Q 0.8T 0.2- 0.2-
Slide 27
7.13. Profile methods in multiple sequence alignment The
profile Sometimes it is useful to align sequences against the
profile, especially if they are very different to each other.
7.14. Prediction from Hidden Markov Method # Sequence Length:
243 # Sequence Number of predicted TMHs: 7 # Sequence Exp number of
AAs in TMHs: 156.33216 # Sequence Exp number, first 60 AAs:
40.14445 # Sequence Total prob of N-in: 0.00006 # Sequence POSSIBLE
N-term signal sequence SequenceTMHMM2.0outside 1 9
SequenceTMHMM2.0TMhelix 10 29 SequenceTMHMM2.0inside 30 41
SequenceTMHMM2.0TMhelix 42 64 SequenceTMHMM2.0outside 65 78
SequenceTMHMM2.0TMhelix 79 101 SequenceTMHMM2.0inside 102 120
SequenceTMHMM2.0TMhelix 121 143 SequenceTMHMM2.0outside 144 152
SequenceTMHMM2.0TMhelix 153 175 SequenceTMHMM2.0inside 176 186
SequenceTMHMM2.0TMhelix 187 209 SequenceTMHMM2.0outside 210 218
SequenceTMHMM2.0TMhelix 219 241 SequenceTMHMM2.0inside 242 243 This
is a highly sophisticated prediction based on hydrophobicities and
known observations etc From http://www.sbc.su.se/internal.htm l The
web is extremely important in bioinformatics Similar programs can
predict helices, sheet and turn etc in globular proteins. 40
Slide 30
8.1. Drug targeting and delivery Physical approaches:
microspheres Drugs enclosed in biodegradable particles that are
delivered to fine capillaries where they get stuck - inject
upstream of target. Biochemical approaches: Raise antibody to
specific antigen, e.g. cell markers on tumour cells then link drug
to antibody. There are still problems as antibodies are large - it
is preferable to use an antibody fragment as it is then distributed
more easily. The drug must still get inside the cell so it must be
attached via a labile linkage.
Slide 31
8.2. The lead Finding a lead - so a major drug development
project can start Serendipity High throughput screening e.g.
testing compounds from companies own database combinatorial
chemistry using libraries specifically designed using molecular
modelling etc for a given target Properties of a lead not just
active in primary screen screen must be validated statistically
passed secondary tests to avoid false positives show promise in a
cascade of tests agreed for its selection must be active in vivo
must be patentable - not too similar to a competitors product Other
desirable properties of lead potent enough for efficacy at a
convenient dose selective within receptor class (e.g adrenergic
ligand selective for v , 1 v 2 selective between classes, e.g. 1
antagonist doesnt act at 5-HT receptors toxicity: good therapeutic
index, not mutagenic active orally; reasonable duration of
activity; stable need to determine whether metabolites possess
activity; are there species anomalies? QSAR can start once we have
a lead