Ligand search and data mining of Structural Genomics structures

Ligand search and data mining of Ligand search and data mining of Structural Genomics structuresStructural Genomics structures

Abhinav Kumar, Herbert Axelrod, Ashley DeaconAbhinav Kumar, Herbert Axelrod, Ashley Deacon

Structure Determination Core, Joint Center for Structural Genomics (JCSG), Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USAStanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA

JCSG Ligand Search3

Unique PSI Ligands8

PDB Ligand Name Ligand PSI2A3L Coformycin 5'-Phosphate CF5 CESG2OU3 1H-Indole-3-Carbaldehyde I3A JCSG1VR0 (2R)-3-Sulfolactic Acid 3SL JCSG2OD6 10-Oxohexadecanoic Acid OHA JCSG1X92 D-Glycero-D-Mannopyranose-7-Phosphate M7P MCSG1O8B Beta-D-Arabinofuranose-5'-Phosphate ABF MCSG2OSU 6-Diazenyl-5-Oxo-L-Norleucine DON MCSG1M33 3-Hydroxy-Propanoic Acid 3OH MCSG1RTW (4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate MP5 NESG2NW9 6-Fluoro-L-Tryptophan FT6 NESG1XKL 2-Amino-4H-1,3-Benzoxathiin-4-Ol STH NESG1LW4 3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid TLP NYSGXRC2B4B N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine B33 NYSGXRC1TUF Azelaic Acid AZ1 NYSGXRC2PUZ N-(Iminomethyl)-L-Glutamic Acid NIG NYSGXRC2Q09 3-[(4S)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid DI6 NYSGXRC2GVC 1-Methyl-1,3-Dihydro-2H-Imidazole-2-Thione MMZ NYSGXRC1Y0G 2-[(2E,6E,10E,14E,18E,22E,26E)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol 8PP NYSGXRC1Z2L Allantoate Ion 1AL NYSGXRC1Y80 Co-5-Methoxybenzimidazolylcobamide B1M SECSG1KPH Didecyl-Dimethyl-Ammonium 10A TBSGC1KPI Didecyl-Dimethyl-Ammonium 10A TBSGC1N2H Pantoyl Adenylate PAJ TBSGC1N2I Pantoyl Adenylate PAJ TBSGC1BVR Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester THT TBSGC1QPR 5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate PPC TBSGC1P44 5-{[4-(9H-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1H-Indole GEQ TBSGC

Unique Ligands9

(R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the structure of putative

2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0)

Indole-3-Carboxaldehyde (I3A) bound to the structure of tellurite resistance

protein of cog3793 (zp_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3)

10-Oxohexadecanoic acid (OHA) bound to the structure of Ferredoxin-like

Protein (JCVI_PEP_1096682647733) from an environmental metagenome

(Unidentified Marine Microbe) (2OD6)

FK9436A (2OH1)Acetyltransferase Gnat family

FB8805A (2Q9K)Unknown protein

Unknown Ligands (UNL)

Autoindex Integrate Solve TraceScale

1. Screen Crystals and Collect Data

2. Automatically Process Data

3. Refine and Evaluate Structures

4. Disseminate Information* Publish Web based Tools

TOPSPAN (www.topsan.org) Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl)

* in collaboration with BIC

The Role of the Structure Determination Core in the JCSG2

The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers funded by NIGMS as part of the production phase of the Protein Structure Initiative (PSI). More than 2600 structures have been deposited into the PDB by the PSI centers as of 2007, of which the JCSG has contributed over 500 structures. Although the major part of JCSG's resources is dedicated to protein structure determination, we are also making efforts to disseminate information gained from these structures to a larger community of researchers. Here we report the development of a web-based data mining engine (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl) that queries all of the PSI structures based on a variety of search criteria. The main objective is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers. Preliminary analysis indicates that 1515 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 262 of these structures contain 136 unique biological ligands. Interestingly, several of these ligands had not been previously identified in structures in the PDB. In addition, 21 different co-factors have been observed in 210 structures.

The Joint Center for Structural Genomics (JCSG)1

6Summary of Ligands (1606 structures)

Ligands (269 structures; 140 different ligands): UNL(70), UNX(22), LLP(6), SIN(6), NDP(6), MA7(6), NAG(5), PLM(4), UNK(4), GUN(3), APC(3), SUC(3), BAL(3), GLC(3), PAF(3), APR(2), GAL(2), NCN(2), CSD(2), SAI(2), CEI(2), BIO(2), HMH(2), SAP(2), GNP(2), 144(2), NCA(2), G4P(2), MPO(2), SRT(2), ANP(2), PCP(2), BGC(2), PAJ(2), NIG(1), PRP(1), NIO(1), ABF(1), IPR(1), MTA(1), CP(1), MLT(1), DI6(1), MED(1), MLZ(1), 5GP(1), CSO(1), CDP(1), I3A(1), 2PL(1), HED(1), G1P(1), NBZ(1), CSY(1), FRU(1), PLG(1), THF(1), B1M(1), ACP(1), DU(1), MMZ(1), OHA(1), 16A(1), THT(1), M7P(1), 3GC(1), CF5(1), PEO(1), CTZ(1), ADE(1), FT6(1), KEG(1), LUM(1), XLS(1), BAM(1), ADN(1), PMP(1), ADQ(1), B33(1), DGI(1), G3H(1), OXG(1), NDS(1), SAL(1), 3SL(1), SIB(1), STH(1), FEO(1), G3P(1), OXN(1), FES(1), TYD(1), DGT(1), 8PP(1), CO2(1), MP5(1), NTM(1), PNS(1), AES(1), APK(1), UVW(1), TRE(1), PYR(1), NAI(1), TCL(1), NMN(1), MAN(1), BFD(1), HHP(1), RIP(1), RBF(1), ORO(1), SNN(1), DTP(1), ZID(1), DEP(1), UPG(1), HXA(1), AAT(1), DTY(1), DON(1), NPO(1), C2E(1), AGC(1), BDF(1), PHT(1), OSB(1), NVA(1), CRO(1), BDN(1), TNE(1), SOG(1), AGS(1), TLP(1), 1PS(1), DUT(1), CXS(1), GEQ(1), MRD(1), G6P(1)

Co-factors (211 structures; 21 different co-factors): FMN(36), NAD(29), COA(18), NAP(17), PLP(15), ADP(15), FAD(15), SAM(14), ATP(9), SAH(9), AMP(9), HEM(8), ACO(7), GDP(4), FS4(3), U5P(2), MLC(1), COD(1), CNC(1), UTP(1), CTP(1)

Metal Ions (647 structures; 30 different metal ions): MG(177), ZN(174), NA(102), CA(83), NI(40), MN(31), FE(26), K(16), FE2(9), CD(8), PT(8), HG(7), CO(5), SM(2), WO4(2), PR(2), AU(2), BA(1), CS(1), MW2(1), SE(1), ARS(1), ZN3(1), O4M(1), YT3(1), LI(1), MO2(1), MO3(1), VO4(1), MO6(1)

Non-metal Ions (692 structures; 22 different non-metal ions): SO4(324), CL(243), PO4(118), NO3(11), IOD(10), BR(10), SCN(8), CO3(4), CAC(4), POP(3), AZI(3), SUL(2), BCT(2), ALF(2), OXL(2), PER(1), SO3(1), MLI(1), PO3(1), THJ(1), 1AL(1), NH4(1)

Organics (90 structures; 26 different organics): IPA(14), EOH(13), BME(9), BEZ(5), TLA(5), SEO(5), AKG(5), ETX(4), TAR(4), PGO(4), DTT(4), OAA(2), ACE(2), DMS(2), MLA(1), DOX(1), XYL(1), MOH(1), 3OH(1), AZ1(1), PPI(1), IOH(1), FOR(1), MYR(1), GTT(1), LMT(1)

Buffers (240 structures; 15 different buffers): ACT(86), ACY(47), FMT(37), CIT(27), TRS(16), EPE(15), MES(12), IMD(8), TMN(2), 10A(2), BTB(2), ICT(1), CPS(1), FLC(1), NHE(1)

Precipitants (98 structures; 13 different precipitants): PEG(38), PG4(28), PGE(16), 1PE(8), P6G(7), 2PE(3), PE4(3), P33(3), PE5(2), PEF(1), BU3(1), 1PG(1), PE8(1)

Salts (3 structures; 3 different salts): DPO(1), AF3(1), PPC(1)

Detergents (2 structures; 1 different detergents): BOG(2)

Cryos (502 structures; 5 different cryos): GOL(244), EDO(241), MPD(32), EGL(3), CRY(2)

5

Search Results (35 hits)

ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 Ligand Depot:

ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 HIC-Up:

Ligand Visualization Links

JCSGFMN UNL

Archaeoglobus Fulgidus Dsm 4304

Crystal Structure of Hypothetical Protein (NP_068944.1) from Archaeoglobus Fulgidus at 1.30 Å resolution

NP_068944.1PF089811vp8TB0885A35

CESGFMNArabidopsis Thaliana

12-0xo-Phytodienoate Reductase Isoform 3NP_178662.1PF007241q45SGT9848034

…………………….

JCSGFMN GOL SO4

Jannaschia Sp. Ccs1

Crystal Structure of Pyridoxamine 5'-phosphate Oxidase- Related FMN-binding (YP_508196.1) From Jannaschia Sp. Ccs1 at 1.60 Å resolution

YP_508196.1PF012432ou5FJ9446A3

JCSGEDO FMN SO4 UNL

Clostridium Acetobutylicum

Crystal Structure of NIMC/NIMA Family Protein (NP_349178.1) from Clostridium Acetobutylicum at 1.80 Å resolution

NP_349178.1PF012432ig6FH7614A2

JCSGEDO FMN NCA

Pyrococcus Horikoshii Ot3

Crystal Structure of FMN-binding Protein (NP_142786.1) from Pyrococcus Horikoshii at 1.35 Å resolution

NP_142786.1PF016132r6vFB10607B1

PSILigandsOrganismDescriptionAccessionPFAMPDBTargetN

A typical Search Result

4 Examples of Search Queries

Ligands bound to JCSG new folds10

Target PDB Description Organism Ligand

CL6107A 2ICH Putative ATTH (NP_841447.1) at 2.00 A Nitrosomonas Europaea NHE

TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at 2.6 A Clostridium Acetobutylicum 3SL

TM0160 1VJL Predicted Protein related to Wound Inducive Proteins in Plants at 1.90 A Thermotoga Maritima UNL

TM0449 1KQ4 Thy1-complementing Protein at 2.25 A Thermotoga Maritima FAD

TM0574 1VKY S-adenosylmethionine Trna Ribosyltransferase at 2.00 A Thermotoga Maritima UNL

TM1394 1VQ0 33 kDa Chaperonin (heat Shock Protein 33 Homolog) at 2.20 A Thermotoga Maritima UNL

TM1464 1VKM Conserved Hypothetical Protein Possibly Involved in Carbohydrate Metabolism at 1.90 A Thermotoga Maritima Msb8 UNL

TM1506 1VK9 Hypothetical Protein at 2.70 A Thermotoga Maritima UNL

TM1553 1VRM Hypothetical Protein at 1.58 A Thermotoga Maritima Msb8 UNL

2ICH

1VQ0

1VR0 1VJL 1KQ4

1VKY

1VRM

1VK91VKM

9 out of 26 new fold structures from JCSG have bound ligands, which identify their active sites and give some clues to function. Often the ligands are modeled as UNL, because their precise identity is unknown.

Distribution of Ligands7

0

10

20

30

40

50

60

70

80

UNL NDP UNK BAL GAL CEI GNP MPO BGC NIO

Ligands

0

5

10

15

20

25

30

35

40

FM

N

NA

D

CO

A

NA

P

PLP

AD

P

FA

D

SA

M

AT

P

SA

H

AM

P

HE

M

AC

O

GD

P

FS

4

U5P

MLC

CO

D

CN

C

UT

P

CT

P

Co-factors

0

20

40

60

80

100

120

140

160

180

200

MG CA FE CD CO PR CS ARS YT3 MO3

Metal Ions

0

50

100

150

200

250

300

350

SO4 PO4 IOD SCN CAC AZI BCT OXL SO3 PO3 1AL

Non-metal Ions

0

10

20

30

40

50

60

70

80

90

100

ACT FMT TRS MES TMN BTB CPS NHE

Buffers

0

5

10

15

20

25

30

35

40

PEG PG4 PGE 1PE P6G 2PE PE4 P33 PE5 PEF BU3 1PG PE8

Precipitants

NDP

GAL

MPO

FMN

PLP FS4

Exploring Binding Modes of Ligands11

There are over 340 structures in PDB with the co-factor Flavin Mononucleotide (FMN) bound to the protein

The binding poses of FMN display considerable variations due to the torsional flexibility in the molecule.

However, unique binding poses can be observed in proteins belonging to specific PFAM families.

Number of Structures

880PF01070

981PF01180

972PF01613

1082PF00724

16133PF00258

1789PF00881

21147PF01243

TotalNon-PSIPSIPFAM

PF01243 (Pyridox_oxidase )

PF01180 (DHOdehase )

PF00881 (Nitroreductase)

PF00258 (Flavodoxin _1)

PF00724 (Oxidored_FMN )

PF01613 (Flavin reductase like)

PF01070 (FMN-dependent dehydrogenase )

UCSD & Burnham(Bioinformatics Core)

John WooleyAdam Godzik Slawomir Grzechnik Lukasz Jaroszewski Dana WeekesLian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr KozbialAndrew Morse Prasad BurraTamara Astakhova Josie AlaoenCindy Cook

TSRI(NMR Core)

Kurt Wüthrich Reto Horst Maggie JohnsonAmaranth

Chatterjee

Michael GeraltWojtek AugustyniakPedro SerranoBill PedriniWilliam Placzek

Stanford /SSRL(Structure Determination Core)

Keith Hodgson Ashley DeaconMitchell Miller Debanu DasHsiu-Ju (Jessica) Chiu Kevin JinChristopher Rife Qingping XuSilvya Oommachen Scott TalafuseHenry van den Bedem Ronald Reyes Christine Trame

Scientific Advisory BoardSir Tom Blundell Robert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San FranciscoJames Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research InstituteSoichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and ProteomicsJames Wells UC San Francisco

The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH.

GNF & TSRI (Crystallomics Core)

Scott Lesley Mark Knuth Dennis CarltonThomas Clayton Kevin D. Murphy Christina TroutMarc Deller Daniel McMullan Heath Klock Polat Abdubek Claire Acosta Linda M. ColumbusJulie Feuerhelm Joanna C. Hale Thamara JanaratneHope Johnson Linda Okach Edward NigoghossianSebastian Sudek Aprilfawn White Bernhard GeierstangerGlen Spraggon Ylva Elias Sanjay AgarwallaCharlene Cho Bi-Ying Yeh Anna GrzechnikJessica Canseco Mimmi Brown

TSRI(Admin Core)Ian WilsonMarc ElsligerGye Won HanDavid MarcianoHenry TienXiaoping DaiLisa van Veen

Documents

Ligand search and data mining of Structural Genomics structures