10
DOI: 10.1002/minf.201000018 Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code ValȖrie Campagna-Slater [a] and Matthieu Schapira* [a, b] 1 Introduction Regulation of epigenetic signalling is a rapidly growing avenue of investigation for the development of new thera- peutic classes. Histone deacetylase inhibitors have already received regulatory approval in oncology, [1] and it is expect- ed that other gene families controlling the condensation state of chromatin will soon emerge as valid points of inter- vention for diverse clinical conditions. [2] These genes can be divided into three categories: writers, readers and erasers of histone marks. [3] The major gene families involved in reading chemical post-translational modifications on his- tone tails are the bromodomain and the extended Royal family that bind lysine acetylation and methylation marks respectively. [4] The extended Royal family is composed of Tudor-, Chromo-, MBT-, and PHD-domain containing pro- teins that can recognize peptide substrates with distinct se- quences and methylation states. Binding is invariably medi- ated by an interaction between the substrates methylated lysine and an aromatic cage of the binding module (see Ref. [4] for review). As of today, it is not clear whether the development of small molecule inhibitors of methyl-lysine (Me-Lys) binding domains is a chemically tractable avenue for drug discovery programs, and epigenetic chemical probes described so far have not targeted this gene class. [5] This may change as (1) high-throughput screening assay techniques are specifically developed for methylation dependent peptide-protein in- teraction [6] (2) high-resolution crystal structures can provide a framework for structure-based design [4, 7] and (3) the de- tailed physico-chemistry driving the central binding event was recently revealed. [8] As of November 2009, the Protein Data Bank (PDB) is de- tailing the interactions between over 9000 unique small molecule compounds and their protein targets. Typically, new or optimized ligands for a specific target can be de- signed based on the structure of other compounds bound to the same target or a close homologue. However, this ra- tional does not apply when no inhibitor is known for a target, let alone an entire target class, as is the case for methyl-lysine binders. Rather than focusing on genes shar- ing sequence homology with the targeted protein, an alter- native approach is to learn from ligands co-crystallized to proteins sharing the same binding site chemistry, regardless of the overall sequence homology. This strategy relies on the observation that sites of unrelated proteins can be tar- geted by the same ligand. [9] For instance, the cross-reactivi- ty of celecoxib, a COX-2 inhibitor, for carbonic anhydrase Abstract : Members of the Royal family of proteins are read- ers of the histone code that contain aromatic cages capa- ble of recognizing specific sequences and lysine methyla- tion states on histone tails. These binding modules play a key role in epigenetic signalling, and are part of a larger group of epigenetic targets that are becoming increasingly attractive for drug discovery. In the current study, pharma- cophore representations of the aromatic cages forming the methyl-lysine (Me-Lys) recognition site were used to search the Protein Data Bank (PDB) for ligand binding pockets possessing similar chemical and geometrical features in un- related proteins. The small molecules bound to these sites were then extracted from the PDB, and clustered based on fragments binding to the aromatic cages. The compounds collected are numerous and structurally diverse, but point to a limited set of preferred chemotypes; these include quaternary ammonium, sulfonium, and primary, secondary and tertiary amine moieties, as well as aromatic, aliphatic or orthogonal rings, and bicyclic systems. The chemical tool-kit identified can be used to design antagonists of the Royal family and related proteins. Keywords: Drug design · Protein Data Bank · Protein-ligand interactions · Royal family · Virtual screening [a] V. Campagna-Slater, M. Schapira Structural Genomics Consortium, University of Toronto MaRS Centre, South Tower, 7 th floor, 101 College Street, Toronto, Ontario, Canada, M5G 1L7 fax: 416-946-0880 *e-mail: [email protected] [b] M. Schapira Department of Pharmacology and Toxicology, University of Toronto Medical Sciences Building, 1 King’s College Circle, Toronto, Ontario, Canada, M5S 1A8 322 # 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2010, 29, 322 – 331 Full Paper

Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Embed Size (px)

Citation preview

Page 1: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

DOI: 10.1002/minf.201000018

Finding Inspiration in the Protein Data Bank to ChemicallyAntagonize Readers of the Histone CodeVal�rie Campagna-Slater[a] and Matthieu Schapira*[a, b]

1 Introduction

Regulation of epigenetic signalling is a rapidly growingavenue of investigation for the development of new thera-peutic classes. Histone deacetylase inhibitors have alreadyreceived regulatory approval in oncology,[1] and it is expect-ed that other gene families controlling the condensationstate of chromatin will soon emerge as valid points of inter-vention for diverse clinical conditions.[2] These genes can bedivided into three categories: writers, readers and erasersof histone marks.[3] The major gene families involved inreading chemical post-translational modifications on his-tone tails are the bromodomain and the extended Royalfamily that bind lysine acetylation and methylation marksrespectively.[4] The extended Royal family is composed ofTudor-, Chromo-, MBT-, and PHD-domain containing pro-teins that can recognize peptide substrates with distinct se-quences and methylation states. Binding is invariably medi-ated by an interaction between the substrate’s methylatedlysine and an aromatic cage of the binding module (seeRef.[4] for review).

As of today, it is not clear whether the development ofsmall molecule inhibitors of methyl-lysine (Me-Lys) bindingdomains is a chemically tractable avenue for drug discoveryprograms, and epigenetic chemical probes described so farhave not targeted this gene class.[5] This may change as (1)high-throughput screening assay techniques are specificallydeveloped for methylation dependent peptide-protein in-teraction[6] (2) high-resolution crystal structures can providea framework for structure-based design[4, 7] and (3) the de-

tailed physico-chemistry driving the central binding eventwas recently revealed.[8]

As of November 2009, the Protein Data Bank (PDB) is de-tailing the interactions between over 9000 unique smallmolecule compounds and their protein targets. Typically,new or optimized ligands for a specific target can be de-signed based on the structure of other compounds boundto the same target or a close homologue. However, this ra-tional does not apply when no inhibitor is known for atarget, let alone an entire target class, as is the case formethyl-lysine binders. Rather than focusing on genes shar-ing sequence homology with the targeted protein, an alter-native approach is to learn from ligands co-crystallized toproteins sharing the same binding site chemistry, regardlessof the overall sequence homology. This strategy relies onthe observation that sites of unrelated proteins can be tar-geted by the same ligand.[9] For instance, the cross-reactivi-ty of celecoxib, a COX-2 inhibitor, for carbonic anhydrase

Abstract : Members of the Royal family of proteins are read-ers of the histone code that contain aromatic cages capa-ble of recognizing specific sequences and lysine methyla-tion states on histone tails. These binding modules play akey role in epigenetic signalling, and are part of a largergroup of epigenetic targets that are becoming increasinglyattractive for drug discovery. In the current study, pharma-cophore representations of the aromatic cages forming themethyl-lysine (Me-Lys) recognition site were used to searchthe Protein Data Bank (PDB) for ligand binding pocketspossessing similar chemical and geometrical features in un-

related proteins. The small molecules bound to these siteswere then extracted from the PDB, and clustered based onfragments binding to the aromatic cages. The compoundscollected are numerous and structurally diverse, but pointto a limited set of preferred chemotypes; these includequaternary ammonium, sulfonium, and primary, secondaryand tertiary amine moieties, as well as aromatic, aliphaticor orthogonal rings, and bicyclic systems. The chemicaltool-kit identified can be used to design antagonists of theRoyal family and related proteins.

Keywords: Drug design · Protein Data Bank · Protein-ligand interactions · Royal family · Virtual screening

[a] V. Campagna-Slater, M. SchapiraStructural Genomics Consortium, University of TorontoMaRS Centre, South Tower, 7thfloor, 101 College Street, Toronto,Ontario, Canada, M5G 1L7fax: 416-946-0880*e-mail : [email protected]

[b] M. SchapiraDepartment of Pharmacology and Toxicology, University ofTorontoMedical Sciences Building, 1 King’s College Circle, Toronto,Ontario, Canada, M5S 1A8

322 � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2010, 29, 322 – 331

Full Paper

Page 2: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

relies on local physicochemical similarities between thebinding sites of the two otherwise unrelated targets.[10] Thisapproach was recently applied to design inhibitors of themitotic kinesin Eg5.[11]

We have previously shown that a combination of twocommonly used computational approaches – pocket iden-tification and pharmacophore screening algorithms – couldbe used to search the PDB for aromatic cages, i.e. siteschemically similar to known methyl-lysine binding ele-ments, and identify novel putative sites involved in epige-netic signalling.[12] In the current study, we searched thePDB for small-molecule ligands bound to aromatic cagesnot necessarily related to epigenetic signalling. The com-pounds collected highlight a limited set of preferred che-motypes, which should guide or inspire discovery chemistrygroups working on the Royal family.

2 Results and Discussion

2.1 Pharmacophore Screening of the PDB

Aromatic cage residues from a representative set of Royalfamily proteins were used to generate 3D pharmacophorequeries. We have shown in previous work that 4-pointpharmacophore models were optimal for searching the

PDB for similar sites (3-point descriptions were too promis-cuous, while 5-point models were too restrictive).[12] Here,4-point pharmacaphore models were extracted from repre-sentative aromatic cages for each sub-group of the extend-ed Royal family: the Chromo domain of CBX3 (PDB: 3dm1),the double Tudor domain of KDM4A (PDB: 2gfa, 2qqs), thesecond MBT domain of L3MBTL (PDB: 2rje, 2pqw, 2rhu,2rhx), and the PHD-finger of BPTF (PDB: 2fsa, 2fuu, 2f6j).Only aromatic and acidic residues surrounding the boundMe-Lys residue were selected to describe the target bind-ing sites; the CBX3, KDM4A and L3MBTL Me-Lys bindingsites were described using 3 aromatic centres and one neg-ative charge, while the BPTF binding site was described by4 aromatic centres (Figure 1).

In a separate step, a list of 97 856 protein-ligand com-plexes was extracted from the PDB (covering 9421 differentligands and 43 439 distinct PDB structures). For each pro-tein-ligand complex, aromatic (Phe, Tyr, Trp) and acidic(Asp, Glu) residue side-chains within 6.0 � of the boundligand were extracted from the protein structure, and theircoordinates were stored in a large SD-formatted file. Theresulting virtual library contained 59 057 entries havingeither 3 aromatic residues and at least one acidic residue,or more than 3 aromatic residues.

Figure 1. Representative structures of the four target receptors used in this study: A) The Chromo domain of CBX3 (PDB: 3dm1). B) Thedouble Tudor domain of KDM4A (PDB: 2gfa). C) The second MBT domain of L3MBTL (PDB: 2rje). D) The PHD-finger of BPTF (PDB: 2fsa). Inthe pharmacophore representations of the binding pockets, aromatic centres (discs) define the relative position of the Phe, Tyr and Trpside-chains, while negative centres (spheres) define the position of the Asp and Glu side-chains. The bound histone peptides and methylat-ed lysine side-chains are shown.

Mol. Inf. 2010, 29, 322 – 331 � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 323

Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Page 3: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Finally, the pharmacophores derived from CBX3, KDM4A,L3MBTL and BPTF (Figure 1) were used as queries to searchthe pockets extracted from the PDB and identified sitesmatching not only the required residue types, but also thecorrect receptor geometry as previously described (seeComputational Methods section).[12] A collection of com-pounds co-crystallized to sites similar to our representativeset was compiled: the CBX3, KDM4A, L3MBTL and BPTFpharmacophore queries retrieved 51, 47, 28 and 24 uniqueligands, respectively.

2.2 Analysis of Ligands Bound to Aromatic Cages

2.2.1 Quaternary Ammonium Compounds and SulfoniumAnalogue

A large number of aromatic cages retrieved from the PDBwere in complex with quaternary ammonium compoundscapable of acting as mimetics of the trimethylated lysineside-chain (Figure 2). This highlights the importance of thecation–p interactions in the molecular recognition of me-thylated lysine residues.[8] In Figures 2A–E, different exam-ples of quaternary ammonium compounds bound to aro-matic cages present in various unrelated proteins areshown. In each case, the PDB identifiers of the ligand andof the structure are indicated. For instance, choline (CHT,PDB: 2reg – Figure 2A), glycine betaine (BET, PDB: 1sw2 –Figure 2B) and p-nitrophenyl-phosphocholine (NCH, PDB:1dl7 – Figure 2C) are trimethyl ammonium cations, and aretherefore direct mimetics of the trimethylated lysine resi-due, as are most of the compounds shown in Figure 2G. Afascinating example is the co-crystal structure of an anti-body bound to p-nitrophenyl-phosphocholine, a mimetic ofthe phosphocholine hapten carried by pathogenic bacteriaand nematodes.[13] The antibody, raised against a phospho-cholinated protein, and co-crystallized with the haptenmimetic, recognizes the quaternary ammonium moiety ofthe hapten via a canonical aromatic cage (Figure 2C): a nat-ural selection process beautifully identifies the chemicalcomplementarity that links readers of histone methyl-marksand quaternary ammonium groups.

Other compounds carry two methyl groups and twobulkier substituents: HC7 possesses a dimethyl quaternaryammonium cation bound to an aromatic cage of humancholine kinase beta (Figure 2D), while HC6 possesses acyclic dimethyl quaternary ammonium cation bound to anaromatic cage of human choline kinase alpha (Figure 2E).

An interesting ligand that was identified as a putative tri-methyl-lysine mimetic is dimethylsulfonioacetate (com-pound 313, Figure 2F), which contains a sulfonium cationbound to an aromatic cage of OpuAC from Bacillus subtillis,and is a close analogue of glycine betaine (BET, Figure 2B).In fact, it is known that the substrate-binding proteinOpuAC (Figure 2F, PDB: 3chg) of the ABC transporter OpuAbinds glycine betaine, proline betaine, as well as dimethyl-sulfonioacetate (compound 313, Figure 2F).[14] In all three

cases, a cation–p interaction with 3 Trp residues is essentialfor binding (which is similar to the Me-Lys binding domainrequirements involving a cation–p interaction with an aro-matic cage), as is the interaction between the carboxylategroup of the ligand and a neighbouring His side-chain(which, on the other hand, is not analogous to the Me-Lysrequirements). This result strongly suggests that the sulfo-nium cation could be used as a bioisostere of the ammoni-um cation when designing ligands for Me-Lys binding do-mains.

Interestingly, the sulfonium group observed in com-pound 313 is present in S-adenosyl methionine, themethyl-donating cofactor used by histone methyltransferas-es to deposit methyl marks on histone tails (the very meth-ylation marks subsequently read by aromatic cages). Iso-thermal titration calorimetry indicated that S-adenosyl me-thionine does not bind to L3MBTL (data not shown).Whether it binds to other readers of the histone code,thereby regulating methyl-marks mediated signalling, re-mains an open question.

2.2.2 Tertiary Amines

Several ligands possessing cyclic tertiary amines were alsoextracted from aromatic cages present in PDB structures(Figure 3). Interestingly, L3MBTL has previously been co-crystallized with 2-(N-morpholino)-ethanesulfonic acid (MES,e.g. PDB: 2rjc), and the first MBT repeat of L3MBTL isknown to bind proline,[15] which itself is a cyclic tertiaryamine. Several of the selected ligands (Figure 3G: TST, 001,SB1, RAP and 587) were retrieved from the active site ofFKBP peptidyl-prolyl cis-trans isomerases (e.g. FKBP12, PDB:1j4r – Figure 3A), for which the substrate is also a prolineresidue. As for HEPES, a cyclic tertiary amine containing apiperazine ring, it was notably retrieved from the co-crystalstructure of the cutA protein from Yersinia pestis (PDB:3gsd – Figure 3B). This suggests that cyclic tertiary aminessuch as those shown in Figure 3G are suitable scaffolds todevelop aromatic cage antagonists.

2.2.3 Primary/Secondary Amines and Imines

Although fewer were selected, some primary and secon-dary amines were also extracted from aromatic cages inthe PDB structures, as well as a few imines (Figure 3). Thesecondary amine of spermidine is surrounded by an aro-matic cage in the Escherichia coli PotD protein co-crystalstructure (PDB: 1pot, Figure 3C), as is the secondary amineof 3-(cyclohexylamino)propane-1-sulfonic acid bound tothe acetylcholine binding protein (PDB: 2bj0, Figure 3D).Surprisingly, the pharmacophores for L3MBTL did not yielda higher number of mono-methylated lysine analoguesthan the other pharmacophores, despite the fact thatL3MBTL preferentially binds lower methylation states(mono/di).

324 www.molinf.com � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2010, 29, 322 – 331

Full Paper V. Campagna-Slater, M. Schapira

Page 4: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Figure 2. Pharmacophore screening of the Protein Data Bank identified several quaternary ammonium compounds and one sulfoniumcompound bound to aromatic cages similar to those which bind methylated lysine residues. Some examples include: A) choline, B) glycinebetaine, C) p-nitrophenyl-phosphocholine. D) (2S)-2-[4’-({dimethyl[2-(phosphonooxy)ethyl]ammonio}acetyl) biphenyl-4-yl]-2-hydroxy-4,4-di-methylmorpholin-4-ium, E) (2S,2’S)-2,2’-biphenyl-4,4’-di-yl-bis(2-hydroxy-4,4-dimethylmorpholin-4-ium), and F) dimethylsulfonioacetate.Ligand codes correspond to the PDB Chemical Identifier. The residues forming an aromatic cage around the ligand in the PDB structureare shown with a transparent surface superimposed on top (light yellow: carbon, blue: nitrogen, red: oxygen, gold: sulfur, orange: phos-phorus). G) A list of ammonium and sulfonium compounds selected using pharmacophore size b-factors of 0.8 � and direction b-factors of0.5 � are shown, with a shaded area highlighting the approximate portion of the ligand that interacts with the aromatic cage.

Mol. Inf. 2010, 29, 322 – 331 � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 325

Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Page 5: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Figure 3. Pharmacophore screening of the Protein Data Bank identified several tertiary, secondary, and primary amines as well as a fewimines bound to aromatic cages similar to those which bind methylated lysine residues. Some examples include: A) 1-[2,2-difluoro-2-(3,4,5-trimethoxy-phenyl)-acetyl]-piperidine-2-carboxylic acid 4-phenyl-1-(3-pyridin-3-yl-propyl)-butyl ester, B) HEPES, C) spermidine, D) 3-(cyclo-hexylamino)propane-1-sulfonic acid, E) trans-4-aminomethylcyclohexane-1-carboxylic acid, F) 1,4-diaminobutane. Ligand codes correspondto the PDB Chemical Identifier. The residues forming an aromatic cage around the ligand in the PDB structure are shown with a transparentsurface superimposed on top (light yellow: carbon, blue: nitrogen, red: oxygen, gold: sulfur, cyan: fluorine). G) A list of tertiary, secondary,and primary amines as well as imines selected using pharmacophore size b-factors of 0.8 � and direction b-factors of 0.5 � are shown, witha shaded area highlighting the approximate portion of the ligand that interacts with the aromatic cage.

326 www.molinf.com � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2010, 29, 322 – 331

Full Paper V. Campagna-Slater, M. Schapira

Page 6: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Among the primary amines extracted from aromaticcages present in the PDB, two of these, ACA and AMH (Fig-ure 3G) were bound to Kringle domains known to bind un-methylated lysine residues.[16] Figures 3E and 3F showtrans-4-aminomethylcyclohexane-1-carboxylic acid (AMH)bound to the Kringle 1 domain of plasminogen (PLG, PDB:1ceb), and 1,4-diaminobutane (PUT) bound to S-adenosyl-methionine decarboxylase (AMD1, PDB: 3dz6), respectively.In both of these protein-ligand complexes, the presence ofan acidic residue appears necessary for ligand binding, as itdirectly interacts with the primary amine. Given this obser-vation, it is not surprising that no aromatic cage bound toa primary amine was retrieved by the pharmacophore de-rived from the BPTF PHD-finger (Figure 1D), which isformed of 4 aromatic residues and no acidic residue.

2.2.4 Aromatic Rings

A number of ligands that were identified contain aromaticrings bound to the aromatic cage (Figure 4), including sev-eral aromatic heterocycles. The p–p stacking interaction be-tween the ligand and the aromatic residues lining thesebinding pockets undoubtedly plays a crucial role in theirbinding affinity. However, other interactions most likelycontribute to stabilizing such complexes. For instance, com-pounds such as HLO (PDB: 2jez, Figure 4A) contain a pyridi-nium ring and may be involved in cation–p interactionswith the binding pocket in addition to p–p stacking. Ringscan also have a variety of substituents attached at differentpositions. For instance, the 4-carboxylamide-pyridiniumring of HLO is not only involved in a cation–p interaction,but its oxygen atom is also hydrogen-bonding to a neigh-bouring backbone nitrogen atom.[17] Although the interac-tions of the ring substituents with atoms not included inthe pharmacophore description may not be conservedwithin the Me-Lys binding domains, these aromatic ringsprovide a good scaffold upon which several putative li-gands can be designed, by adding varying substituents totake advantage of the nuances in the different Me-Lys bind-ing domains.

In addition to several aromatic 6-membered rings beingextracted (e.g. Figure 4A–D), a few aromatic 5-memberedrings were also retrieved (Figure 4E–F). In the case of hista-mine bound to the histamine binding-protein from ticks(PDB: 1qft, Figure 4F), both the imidazole ring and the pri-mary amine play a key role in binding: the imidazole ring issurrounded by an aromatic cage while the primary amine isinvolved in several hydrogen bonds.[18]

2.2.5 Other Selected Ligands

The pharmacophore derived from the BPTF PHD-finger(which is composed of only 4 aromatic centres and no neg-ative centres) identified 6-chloro-4-cyclohexylsulfanyl-3-propyl-1H-quinolin-2-one (H16, Figure 5a), a molecule con-taining a hydrophobic hexyl ring bound to an aromatic

cage of HIV-1 reverse transcriptase (PDB: 1tkz). This resultsuggests that aliphatic rings may be considered as ligandscaffolds for Me-Lys binding sites that are not lined withacidic residues. Interestingly, two surveys of organic crystalstructures previously demonstrated that aromatic ring – ali-phatic ring stacking is common in the Cambridge StructuralDatabase.[19]

The parameters for the pharmacophore queries can bemodified to be more permissive, allowing for additionalbinding sites to be retrieved from the PDB, and providing awider array of interesting chemical groups bound to aro-matic cages (a few examples are given in Figure 5B–F). Forinstance, Figure 5B shows an example of an alkene boundto an aromatic cage; this particular compound (TB9) isbound to the same protein as the one shown in Figure 5A(H16), namely HIV-1 reverse transcriptase (PDB: 1rev). In ad-dition to binding aliphatic rings and alkenes, availablestructures of HIV-1 reverse transcriptase also reveal alkynes(e.g. GWB, PDB: 1tkx) and aromatic rings (e.g. PDZ, PDB:3di6) bound to this aromatic cage. This wide range ofchemical moieties which bind to the aromatic pocket ofHIV-1 reverse transcriptase may provide significant inspira-tion for designing compounds targeting the aromaticcages of Me-Lys binding domains, and in particular thosedomains composed entirely of aromatic side-chains such asthe PHD-finger of BPTF.

Other interesting structures identified include, for in-stance, the tropane group of cocaine (COC) which optimal-ly occupies an aromatic cage in its co-crystal structure withthe acetylcholine binding protein (PDB: 2pgz, Figure 5C).The pyridin-2-(1H)-one and fluoro-phenyl rings of com-pound 230 are orthogonal to each other in the complex ofthe inhibitor with factor Xa, and are each stacking in aface-to-face orientation with orthogonal rings of the aro-matic cage (PDB: 2phb, Figure 5D). Similarly, the pyridineand 1-methylpyrrolidine rings of nicotine lie approximatelyperpendicular to one another in the acetylcholine bindingprotein co-crystal structure (PDB: 1uw6, Figure 5E). Thesecompounds suggest that bicyclic systems and pairs of or-thogonal rings may be suitable scaffolds for ligand design.

N,N-dimethyl-decylamine-N-oxide (DDQ), which was co-crystallized in the choline binding domain of major pneu-mococcal autolysin (LytA), was also identified via the phar-macophore search (PDB: 1gvm, Figure 5F), and is yet an-other compound that may act as a suitable mimic of theMe-Lys side-chain. Interestingly, the N,N-dimethyl-N-oxidemoiety can form both a cation–p interaction and engage ina hydrogen bond to the aromatic system of a proximaltryptophan via its N-oxide moiety.

Several poly-substituted aliphatic rings bound to aromat-ic cages were also extracted from the PDB. However, mostof these compounds (e.g. glucose) rely on their many hy-droxyl groups to engage in numerous hydrogen bondswith the indole ring of Trp residues, hydroxyl groups of Tyrresidues, or with additional hydrogen bond donor or ac-ceptor residues lining the binding pocket. Such large num-

Mol. Inf. 2010, 29, 322 – 331 � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 327

Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Page 7: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Figure 4. Pharmacophore screening of the Protein Data Bank identified several compounds possessing aromatic rings bound to aromaticcages similar to those which bind methylated lysine residues. Some examples include: A) 1-[[2,4-bis[(E)-hydroxyiminomethyl]pyridin-1-ium-1-yl]methoxymethyl]pyridin-1-ium-4-carboxamide, B) 2,2’:6’,2’’-terpyridine platinum(II), C) 1-(5-cyanopyridin-2-yl)-3-[2-(4-ethoxy-3-fluoro-pyri-din-2-yl)ethyl]thiourea, D) 2-[[(benzhydrylamino)-[(4-cyanophenyl)amino]methylidene] amino]ethanoic acid, E) [(2R,3S,4S,5R,6R)-2-

328 www.molinf.com � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2010, 29, 322 – 331

Full Paper V. Campagna-Slater, M. Schapira

Page 8: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

bers of hydrogen bond donors and acceptors are typicallynot observed in the binding site of Me-Lys binding do-mains, suggesting that these compounds may not be suita-ble starting structures for ligand design.

3 Conclusions

Inhibiting the recruitment of methyl-lysine binding mod-ules to histone post-translational marks is an avenue to

control epigenetic regulation of gene transcription. Whilethis approach rests on solid biological foundations,[7, 20]

proof-of-concept has not yet been established. Peptide-protein interactions are historically challenging targets, andjoint efforts of medicinal and computational chemists, bio-chemists, cellular and structural biologists, will be necessaryto probe the chemical tractability of methyl-lysine sensingaromatic cages, and develop a tool-kit for pre-competitivetarget validation. By screening the PDB using a set of phar-macophores representing the chemical and geometrical

[(2R,3S,4S,5S,6S)-2-[(1R,2S)-2-[[6-amino-2-[(1S)-3-amino-1-[[(2S)-2,3-diamino-3-oxo-propyl]amino]-3-oxo-propyl]-5-methyl-pyrimidin-4-yl]car-bonylamino]-3-[[(2R,3S,4S)-5-[[(2S,3R)-1-[2-[4-[4-[3-[4-(3-aminopropylamino)butylamino]propylcarbamoyl]-1,3-thiazol-2-yl]-1,3-thiazol-2-yl]-ethylamino]-3-hydroxy-1-oxo-butan-2-yl]amino]-3-hydroxy-4-methyl-5-oxo-pentan-2-yl]amino]-1-(1H-imidazol-4-yl)-3-oxo-propoxy]-4,5-dihy-droxy-6-(hydroxymethyl)oxan-3-yl]oxy-3,5-dihydroxy-6-(hydroxymethyl)oxan-4-yl] carbamate, F) histamine. Ligand codes correspond to thePDB Chemical Identifier. The residues forming an aromatic cage around the ligand in the PDB structure are shown with a transparent sur-face superimposed on top (light yellow: carbon, blue: nitrogen, red: oxygen, gold: sulfur, cyan: fluorine, dark grey: platinum). G) A list of ar-omatic compounds selected using pharmacophore size b-factors of 0.8 � and direction b-factors of 0.5 � are shown, with a shaded areahighlighting the approximate portion of the ligand which interacts with the aromatic cage.

Figure 5. Pharmacophore screening of the Protein Data Bank identified compounds possessing various chemical moieties bound to aro-matic cages similar to those which bind methylated lysine residues. A few examples include: A) 6-chloro-4-cyclohexylsulfanyl-3-propyl-1H-quinolin-2-one, B) 4-chloro-8-methyl-7-(3-methyl-but-2-enyl)-6,7,8,9-tetrahydro-2H-2,7,9A-triaza-benzo[CD]azulene-1-thione, C) cocaine, D)(2R,4R)-N-(4-chlorophenyl)-N’-[2-fluoro-4-(2-oxopyridin-1-yl)phenyl]-4-methoxy-pyrrolidine-1,2-dicarboxamide, E) nicotine, and F) N,N-di-methyl-decylamine-N-oxide. Ligand codes correspond to the PDB Chemical Identifier. The residues forming an aromatic cage around theligand in the PDB structure are shown with a transparent surface superimposed on top (light yellow: carbon, blue: nitrogen, red: oxygen,gold: sulfur, cyan: fluorine, green: chlorine). G) The 2D chemical structures of compounds shown in A–F, with a shaded area highlightingthe approximate portion of the ligand which interacts with the aromatic cage.

Mol. Inf. 2010, 29, 322 – 331 � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 329

Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

Page 9: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

properties of Me-Lys binding domains, we were able tocompile a diverse collection of structurally validated che-motypes targeting aromatic cages. Although the entire li-gands identified are not predicted to bind to the extendedRoyal family, the core fragments extracted from the ligandsshould provide good chemical starting points and assistmedicinal chemists in their effort to design antagonists.

4 Computational Methods

4.1 Describing the Chemical and Geometrical Features ofthe Target Receptor

Pharmacophores were used to create a computational rep-resentation of the target receptors. These pharmacophoreswere generated using the ICM 3-6.1 software (MolsoftLLC).[21] Aromatic residues (Phe, Tyr and Trp) were repre-sented by a single aromatic centre, while acidic residueswere represented by a single negative centre. Since ICMgenerates two aromatic centres for each Trp residue (onefor each of the two fused rings), two pharmacophore repre-sentations were used for each structure: in each of the twodescriptions, a different aromatic centre was kept for theTrp residue. The CBX3 Chromo domain (PDB: 3dm1), theKDM4A double Tudor domain (PDB: 2gfa, 2qqs) and thesecond L3MBTL MBT domain (PDB: 2rje, 2pqw, 2rhu, 2rhx)were described using 3 aromatic centres and one negativecentre, while the PHD-finger of BPTF (PDB: 2fsa, 2fuu, 2f6j)was described by 4 aromatic centres.

4.2 Extracting a List of Protein-Ligand Complexes

A list of ligands bound to proteins deposited to the ProteinData Bank (PDB) was downloaded from the PDB’s LigandExpo in June 2009. This file contained 97 856 protein-ligandcomplexes which covered 9421 different ligands and 43 439individual PDB structures.

4.3 Extracting Residues Surrounding the Ligands

The ICM 3-6.1 (Molsoft LLC) scripting language[21] was usedto automate a procedure allowing for residues of pre-de-fined types (herein: aromatic and acidic) surrounding anygiven ligand to be extracted from a protein structure andstored as an entry in an SD-formatted virtual library of pro-tein sites. For each protein-ligand entry in the list of com-plexes downloaded from the PDB, all Phe, Tyr, Trp, Asp andGlu residues whose side-chains are located within 6.0 � ofthe bound ligand were selected, and their coordinateswere saved in the virtual library. Sites that do not containthe minimum number of aromatic/acidic residues as de-fined by the pharmacophores describing the target recep-tor sites were omitted from the virtual library (CBX3,KDM4A and L3MBTL require a minimum of 3 aromatic resi-dues and 1 acidic residue, while BPTF requires a minimumof 4 aromatic residues).

4.4 Pharmacophore Screening

The pharmacophores representing the four target receptorswere used as queries to screen the virtual library of proteinsites assembled from the collection of protein-ligand com-plexes. This was carried out using the ICM 3-6.1 find phar-macophore command.[21] Relatively restrictive b-factorswere used for the pharmacophores, in order to only extractsites close in geometry to the target receptors (Qm and Qnsize b-factors of 0.8, and Qv direction b-factors of 0.5). Fi-nally, a list of the bound ligands corresponding to the pro-tein sites matching at least one of the pharmacophorequeries was retrieved.

Acknowledgements

We thank G. Senisterra and M. Vedadi for testing the bind-ing of S-adenosyl methionine to L3MBTL, and David Smilfor his comments. V.C.-S. acknowledges the Natural Sciencesand Engineering Research Council of Canada for funding.This work was supported by the Structural Genomics Con-sortium. The SGC is a registered charity (number 1097737)that receives funds from the Canadian Institutes for HealthResearch, the Canadian Foundation for Innovation, GenomeCanada through the Ontario Genomics Institute, Glaxo-SmithKline, Karolinska Institutet, the Knut and Alice Wallen-berg Foundation, the Ontario Innovation Trust, the OntarioMinistry for Research and Innovation, Merck & Co., Inc. , theNovartis Research Foundation, the Swedish Agency for Inno-vation Systems, the Swedish Foundation for Strategic Re-search and the Wellcome Trust.

References

[1] a) W. S. Xu, R. B. Parmigiani, P. A. Marks, Oncogene 2007, 26,5541 – 5552; b) H. M. Prince, M. J. Bishton, S. J. Harrison, Clin.Cancer Res. 2009, 15, 3958 – 3969.

[2] a) B. R. Keppler, T. K. Archer, Expert Opin. Ther. Targets 2008, 12,1301 – 1312; b) Y. G. Zheng, J. Wu, Z. Chen, M. Goodman, Med.Res. Rev. 2008, 28, 645 – 687.

[3] T. Kouzarides, Cell 2007, 128, 693 – 705.[4] S. D. Taverna, H. Li, A. J. Ruthenburg, C. D. Allis, D. J. Patel, Nat.

Struct. Mol. Biol. 2007, 14, 1025 – 1040.[5] P. A. Cole, Nat. Chem. Biol. 2008, 4, 590 – 597.[6] a) A. M. Quinn, M. T. Bedford, A. Espejo, A. Spannhoff, C. P.

Austin, U. Oppermann, A. Simeonov, Nucleic Acids. Res. 2010,38(2), e11; b) T. J. Wigle, J. M. Herold, G. A. Senisterra, M.Vedadi, D. B. Kireev, C. H. Arrowsmith, S. V. Frye, W. P. Janzen, J.Biomol. Screen. 2010, 15, 62 – 71.

[7] M. A. Adams-Cioaba, J. Min, Biochem. Cell Biol. 2009, 87, 93 –105.

[8] a) R. M. Hughes, K. R. Wiggins, S. Khorasanizadeh, M. L. Waters,PNAS 2007, 104, 11184 – 11188; b) Z. Lu, J. Lai, Y. Zhang, J. Am.Chem. Soc. 2009, 131, 14928 – 14931.

[9] M. A. Koch, L.-O. Wittenberg, S. Basu, D. A. Jeyaraj, E. Gourzou-lidou, K. Reinecke, A. Odermatt, H. Waldmann, PNAS 2004,101, 16721 – 16726.

330 www.molinf.com � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2010, 29, 322 – 331

Full Paper V. Campagna-Slater, M. Schapira

Page 10: Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code

[10] A. Weber, A. Casini, A. Heine, D. Kuhn, C. T. Supuran, A. Scozza-fava, G. Klebe, J. Med. Chem. 2004, 47, 550 – 557.

[11] K. Oguievetskaia, L. Martin-Chanas, A. Vorotyntsev, O. Doppelt-Azeroual, X. Brotel, S. A. Adcock, A. G. de Brevern, F. Delfaud, F.Moriaud, J. Comput. Aided Mol. Des. 2009, 23, 571 – 582.

[12] V. Campagna-Slater, A. G. Arrowsmith, Y. Zhao, M. Schapira, J.Chem. Inf. Model. 2010, 50, 358 – 367

[13] M. Brown, M. A. Schumacher, G. D. Wiens, R. G. Brennan, M. B.Rittenberg, J. Exp. Med. 2000, 191, 2101 – 2112.

[14] S. H. J. Smits, M. Hçing, J. Lecher, M. Jebbar, L. Schmitt, E.Bremer, J. Bacteriol. 2008, 190, 5663 – 5671.

[15] W. K. Wang, V. Tereshko, P. Boccuni, D. MacGrogan, S. D. Nimer,D. J. Patel, Structure 2003, 11, 775 – 789.

[16] a) D. N. Marti, J. Schaller, M. Llin�s, Biochemistry 1999, 38,15741 – 15755; b) I. Mochalkin, B. Cheng, O. Klezovitch, A. M.Scanu, A. Tulinsky, Biochemistry 1999, 38, 1990 – 1998.

[17] F. J. Ekstrçm, C. Astot, Y. P. Pang, Clin. Pharmacol. Ther. 2007,82, 282 – 293.

[18] G. C. Paesen, P. L. Adams, K. Harlos, P. A. Nuttall, D. I. Stuart,Mol. Cell 1999, 3, 661 – 671.

[19] a) P. K. C. Paul, Cryst. Eng. 2002, 5, 3 – 8; b) Z. Ciunik, S. Berski,Z. Latajka, J. Leszczynski, J. Mol. Struct. 1998, 442, 125 – 134.

[20] a) S. R. Bhaumik, E. Smith, A. Shilatifard, Nat. Struct. Mol. Biol.2007, 14, 1008 – 1016; b) B. D. Strahl, C. D. Allis, Nature 2000,403, 41 – 45.

[21] ICM 3.6-1; Molsoft LLC, San Diego, CA.

Received: February 18, 2010Accepted: March 2, 2010

Published online: April 9, 2010

Mol. Inf. 2010, 29, 322 – 331 � 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 331

Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code