Drug Discovery: Proteomics, Genomics
Philip E. BourneProfessor of Pharmacology UCSD
[email protected] 858-534-8301
1SPPS273
Agenda
• Where my perspective comes from• The interplay between omics, IT and drug
discovery• The omics revolution• Changes in IT and open science and software
licensing• Applying the new biology to drug discovery
– Example 1 – Drug repositioning– Example 2 - Determining side-effects
• Words of caution
SPPS273 2
Some Background• We work in the area of structural
bioinformatics• We distribute the equivalent to ¼
the Library of Congress to approx. 250,000 scientists each month
• We are interested in improving the drug discovery process through computationally driven hypotheses on the complete biological system
• Personally:– Open science advocate– Started 4 companies – Spent whole life in the ivory tower
SPPS273 3The Source of My Perspective
Observations
• Glass ½ Empty: drug discovery in the traditional sense is in a woeful state
• Glass ½ Full:– We have an explosion of
data and hence a new emerging understanding of complex biological systems
– Information technology is advancing rapidly
• Let optimism rule – let traditional computational chemistry and cheminfomatics meet bioinformatics, systems biology and information science to discover drugs in new ways
SPPS273 4The Take Home Message
Biological Experiment Data Information Knowledge Discovery
Collect Characterize Compare Model Infer
Sequence
Structure
Assembly
Sub-cellular
Cellular
Organ
Higher-life
Year90 05
Computing Power
Sequencing
Data1 10 100 1000 105
95 00
Human Genome Project
E.ColiGenome
C.ElegansGenome
1 Small Genome/Mo.
ESTs
YeastGenome
Gene Chips
Virus Structure
Ribosome
Model Metaboloic Pathway of E.coli
Complexity Technology
Brain Mapping
Genetic Circuits
Neuronal Modeling
Cardiac Modeling
Human Genome
# People/Web Site
106 102 1
VirtualCommunities
The Drivers of Change – Data & IT
106
BlogsFacebook
1000’sGWAS
The Omics Revolution
Num
ber
of r
elea
sed
entr
ies
Year
Its Not Just About Numbers its About Complexity
The Omics Revolution Courtesy of the RCSB Protein Data Bank
8
Metagenomics - 2007• New type of genomics
• New data (and lots of it) and new types of data– 17M new (predicted
proteins!) 4-5 x growth in just few months and much more coming
– New challenges and exacerbation of old challenges
The Omics Revolution
9
Metagenomics: Early Results
• More then 99.5% of DNA in very environment studied represent unknown organisms– Culturable organisms are
exceptions, not the rule• Most genes represent
distant homologs of known genes, but there are thousands of new families
• Everything we touch turns out to be a gold mine
• Environments studied:– Water (ocean, lakes)– Soil– Human body (gut, oral
cavity, human microbiome)
The Omics Revolution
10
Metagenomics New DiscoveriesEnvironmental (red) vs. Currently Known PTPases (blue)
Higher eukaryotes
1
23
4The Omics Revolution
11
The Good News and the Bad News
• Good news– Data pointing towards function are growing at
near exponential rates– IT can handle it on a per dollar basis
• Bad news– Data are growing at near exponential rates– Quality is highly variable– Accurate functional annotation is sparse
The Omics Revolution
12
Example of the Interplay Between Bioinformatics & Proteomics - The Structural Genomics Pipeline
Basic Steps
Target Selection
Crystallomics• Isolation,• Expression,• Purification,• Crystallization
DataCollection
StructureSolution
StructureRefinement
Functional Annotation Publish
The Omics Revolution
Structural biology moves from being functionally driven to genomically driven
Fill inprotein fold
space
Robotics-ve data
Software engineering Functional prediction
Notnecessarily
Towards Open Science
• Open access publishing• Open source software• Generation of scientists weaned on social
networks• Blogs, wikis, social bookmarking etc. are
becoming a valid form of scientific discourse
SPPS273 13http://www.osdd.net/
University Tech Transfer Offices are Slow to Embrace this Change
• Overvalue disclosures• Inability to market disclosures appropriately• Protracted negotiations in a fast moving
market• Disable rather than enable startups
SPPS273 14
So Why is All of This So Important to Drug Discovery?
We are beginning to piece together a complex living system and we need
to understand that to do better
SPPS273 15
A.L. Hopkins Nat. Chem. Biol. 2008 4:682-690
Why Don’t we Do Better?A Couple of Observations
• Gene knockouts only effect phenotype in 10-20% of cases , why? – redundant functions – alternative network routes – robustness of interaction networks
• 35% of biologically active compounds bind to more than one target
Paolini et al. Nat. Biotechnol. 2006 24:805–815
Why Don’t we Do Better?A Couple of Observations
• Tykerb – Breast cancer
• Gleevac – Leukemia, GI cancers
• Nexavar – Kidney and liver cancer
• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive
Collins and Workman 2006 Nature Chemical Biology 2 689-700
Implications
• Ehrlich’s philosophy of magic bullets targeting individual chemoreceptors has not been realized
• Stated another way – The notion of one drug, one target, one disease is a little naïve in a complex system
So How Can We Exploit All The New Data We are Collecting on This
Complex System?
Lets Work Through a Couple of Examples
SPPS273 19
What if…
• We can characterize a protein-ligand binding site from a 3D structure (primary site) and search for that site on a proteome wide scale?
• We could perhaps find alternative binding sites (off-targets) for existing pharmaceuticals and NCEs?
Exploiting the Structural Proteome
What Do These Off-targets Tell Us?
• Potentially many things:1. Nothing2. How to optimize a NCE3. A possible explanation for a side-effect of a drug
already on the market4. A possible repositioning of a drug to treat a
completely different condition5. The reason a drug failed 6. A multi-target strategy to attack a pathogen
Exploiting the Structural Proteome
Need to Start with a 3D Drug-Receptor Complex - The PDB Contains Many Examples
Generic Name Other Name Treatment PDBid
Lipitor Atorvastatin High cholesterol 1HWK, 1HW8…
Testosterone Testosterone Osteoporosis 1AFS, 1I9J ..
Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH
Viagra Sildenafil citrate ED, pulmonary arterial hypertension
1TBF, 1UDT, 1XOS..
Digoxin Lanoxin Congestive heart failure
1IGJ
Exploiting the Structural Proteome
A Reverse Engineering Approach to Drug Discovery Across Gene Families
Characterize ligand binding site of primary target (Geometric Potential)
Identify off-targets by ligand binding site similarity(Sequence order independent profile-profile alignment)
Extract known drugs or inhibitors of the primary and/or off-targets
Search for similar small molecules
Dock molecules to both primary and off-targets
Statistics analysis of docking score correlations
…
Xie and Bourne 2009 Bioinformatics 25(12) 305-312
Exploiting the Structural Proteome
The Problem with Tuberculosis
• One third of global population infected• 1.7 million deaths per year• 95% of deaths in developing countries• Anti-TB drugs hardly changed in 40 years• MDR-TB and XDR-TB pose a threat to
human health worldwide• Development of novel, effective, and
inexpensive drugs is an urgent priority
Example 1 – Repositioning The TB Story
Found..
• Evolutionary linkage between: – NAD-binding Rossmann fold– S-adenosylmethionine (SAM)-binding domain of SAM-
dependent methyltransferases• Catechol-O-methyl transferase (COMT) is SAM-
dependent methyltransferase• Entacapone and tolcapone are used as COMT
inhibitors in Parkinson’s disease treatment• Hypothesis:
– Further investigation of NAD-binding proteins may uncover a potential new drug target for entacapone and tolcapone
Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423Example 1 – Repositioning The TB Story
Functional Site Similarity between COMT and InhA
• Entacapone and tolcapone docked onto 215 NAD-binding proteins from different species
• M.tuberculosis Enoyl-acyl carrier protein reductase ENR (InhA) discovered as potential new drug target
• InhA is the primary target of many existing anti-TB drugs but all are very toxic
• InhA catalyses the final, rate-determining step in the fatty acid elongation cycle
• Alignment of the COMT and InhA binding sites revealed similarities ...
Repositioning - The TB Story Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423
Binding Site Similarity between COMT and InhA
COMT
SAM (cofactor)
BIE (inhibitor)
NAD (cofactor)
InhA
641 (inhibitor)
Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423Example 1 – Repositioning The TB Story
Summary of the TB Story
• Entacapone and tolcapone shown to have potential for repositioning
• Direct mechanism of action avoids M. tuberculosis resistance mechanisms
• Possess excellent safety profiles with few side effects – already on the market
• In vivo support• Assay of direct binding of entacapone and tolcapone to
InhA reveals a possible lead with no chemical relationship to existing drugs
Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423Example 1 – Repositioning The TB Story
Summary from the TB Alliance – Medicinal Chemistry
• The minimal inhibitory concentration (MIC) of 260 uM is higher than usually considered
• MIC is 65x the estimated plasma concentration
• Have other InhA inhibitors in the pipeline
Kinnings et al. 2009 PLoS Comp Biol 5(7) e1000423Example 1 – Repositioning The TB Story
Predicted protein-ligand interaction network of M.tuberculosis. Proteins that are predicted to have similar binding sites are connected. Squares represent the top 18
most connected proteins.
The TB DruggomeBioinformatics 2009 25(12) 305-312
The TB DruggomeBioinformatics 2009 25(12) 305-312
SMAP p-value < 1e-5
drugs
TB proteins p < 1e-7p < 1e-6p < 1e-5
The TB Druggome
New Ways of Thinking
• Polypharmacology – One or multiple drugs binding to multiple targets for a collective effect aka Dirty Drugs
• Network Pharmacology – Measuring that effect on the whole biological network
SPPS273 33
Example 2 - The Torcetrapib Story PLoS Comp Biol 2009 5(5) e1000387
Cholesteryl Ester Transfer Protein (CETP)
• collects triglycerides from very low density or low density lipoproteins (VLDL or LDL) and exchanges them for cholesteryl esters from high density lipoproteins (and vice versa)
• A long tunnel with two major binding sites. Docking studies suggest that it possible that torcetrapib binds to both of them.
• The torcetrapib binding site is unknown. Docking studies show that both sites can bind to torcetrapib with the docking score around -8.0.
HDLLDL
CETP
CETP inhibitor
X
Bad Cholesterol Good Cholesterol
PLoS Comp Biol 2009 5(5) e1000387Example 2 - The Torcetrapib Story
Off-target PDB Ids Torcetrapib Anacetrapib JTT705 Complex ligand
CETP 2OBD -11.675 / -5.72 -11.375 / -8.15 -7.563 / -6.65 -8.324 (PCW)
Retinoid X receptor 1YOW1ZDT
-11.420 / -6.600 -6.74
-8.696 / -7.68 -7.35
-6.276 / -7.28 -6.95
-9.113 (POE)
PPAR delta 1Y0S -10.203 / -8.22 -10.595 / -7.91 -7.581 / -8.36 -10.691(331)
PPAR alpha 2P54 -11.036 / -6.67 -0.835 / -7.27 -9.599 / -7.78 -11.404(735)
PPAR gamma 1ZEO -9.515 / -7.31 > 0.0 / -8.25 -7.204 / -8.11 -8.075 (C01)
Vitamin D receptor 1IE8 >0.0/ -4.73 >0.0 / -6.25 -6.628 / -9.70 -8.354 (KH1) -7.35
Glucocorticoid Receptor
1NHZ1P93
/-4.43 /-5.63
/-7.08 /-0.58
/-7.09 /-9.42
Fatty acid binding protein
2F732PY12NNQ
>0.0/ -4.33>0.0/-6.13 /-6.40
>0.0/ -7.81>0.0/ -6.98 /-7.64
-7.191 / -8.49 /-6.33 /6.35
???
T-Cell CD1B 1GZP -8.815 / -7.02 -13.515 / -7.15 -7.590 / -8.02 -6.519 (GM2)
IL-10 receptor 1LQS / -4.59 / -6.77 / -5.95 ???
GM-2 activator 2AG9 -9.345 / -6.26 -9.674 / -6.98 -8.617 / -6.17 ??? (MYR) -4.16
(3CA2+) CARDIAC TROPONIN C
1DTL /-5.83 /-6.71 /-5.79
cytochrome bc1 complex
1PP9 (PEG) /-6.97 /-9.07 /-6.64
1PP9 (HEM) /-7.21 /8.79 /-8.94
human cytoglobin 1V5H /-4.89 /-7.00 /-4.94
Docking Scores eHits/Autodock
PLoS Comp Biol 2009 5(5) e1000387Example 2 - The Torcetrapib Story
RAS PPARα
RXR
VDR
+
–
High blood pressure
FABPFA
+
Anti-inflammatory function
?
Torcetrapib Anacetrapib JTT705
JNK/IKK pathwayJNK/NF-KB pathway
?
Immune response to infection
JTT705
PPARδ
PPARγ
?
PLoS Comp Biol 2009 5(5) e1000387Example 2 - The Torcetrapib Story
Chang et al. 2009 Mol Sys Biol Submitted
The Future?
Modifications to Early Stage Drug Discovery
SPPS273 39http://www.celgene.com/images/celgene_drug_arrow.gif
Off-targets Systems Biology
Some Known Limitations
• Structural coverage of the given proteome• False hits / poor docking scores• Literature searching• It’s a hypothesis – need experimental
validation• Money
Known Limitations
Perceived Limitations
• Mistrust of computational approaches
• Bioinformatics was previously oversold
• Omics was previously oversold
• Still too cutting edge
• No interest in drug resistance
SPPS273 41