1
Ligand search and data mining of Ligand search and data mining of Structural Genomics structures Structural Genomics structures Abhinav Kumar, Herbert Axelrod, Ashley Deacon Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center for Structural Genomics (JCSG), Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA JCSG Ligand Search 3 Unique PSI Ligands 8 PDB Ligand N am e Ligand PSI 2A 3L C oform ycin 5'-P hosphate C F5 CESG 2O U 3 1H -Indole-3-C arbaldehyde I3A JC S G 1V R 0 (2R )-3-S ulfolactic A cid 3S L JC S G 2O D 6 10-O xohexadecanoic A cid OHA JC S G 1X 92 D -Glycero-D-M annopyranose-7-Phosphate M 7P MCSG 1O 8B Beta-D -Arabinofuranose-5'-Phosphate ABF MCSG 2O S U 6-D iazenyl-5-O xo-L-Norleucine DON MCSG 1M 33 3-H ydroxy-P ropanoic A cid 3O H MCSG 1R TW (4-A m ino-2-M ethylpyrim idin-5-Yl)M ethyl D ihydrogen P hosphate MP5 NESG 2NW 9 6-Fluoro-L-Tryptophan FT6 NESG 1X K L 2-A m ino-4H -1,3-B enzoxathiin-4-O l STH NESG 1LW 4 3-H ydroxy-2-[(3-H ydroxy-2-M ethyl-5-P hosphonooxym ethyl-P yridin-4-Ylm ethyl)-A m ino]-B utyric A cid TLP NYSGXRC 2B 4B N-Ethyl-N-[3-(Propylam ino)Propyl]Propane-1,3-D iam ine B 33 NYSGXRC 1TU F Azelaic Acid A Z1 NYSGXRC 2P U Z N-(Im inom ethyl)-L-G lutam ic Acid NIG NYSGXRC 2Q 09 3-[(4S )-2,5-D ioxoim idazolidin-4-Yl]P ropanoic A cid DI6 NYSGXRC 2G V C 1-Methyl-1,3-Dihydro-2H-Im idazole-2-Thione MMZ NYSGXRC 1Y0G 2-[(2E ,6E ,10E ,14E ,18E ,22E ,26E )-3,7,11,15,19,23,27,31-O ctam ethyldotriaconta-2,6,10,14,18,22,26,30-O ctaenyl]P henol 8PP NYSGXRC 1Z2L Allantoate Ion 1A L NYSGXRC 1Y80 Co-5-M ethoxybenzim idazolylcobam ide B1M SECSG 1KPH Didecyl-Dim ethyl-Am m onium 10A TBSGC 1K P I Didecyl-Dim ethyl-Am m onium 10A TBSGC 1N 2H P antoyl A denylate PAJ TBSGC 1N 2I P antoyl A denylate PAJ TBSGC 1BVR Trans-2-H exadecenoyl-(N -Acetyl-C ysteam ine)-Thioester TH T TBSGC 1Q P R 5-P hosphoribosyl-1-(B eta-M ethylene)P yrophosphate PPC TBSGC 1P 44 5-{[4-(9H -Fluoren-9-Yl)P iperazin-1-Yl]C arbonyl}-1H -Indole GEQ TBSGC Unique Ligands 9 (R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the structure of putative 2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0) Indole-3-Carboxaldehyde (I3A) bound to the structure of tellurite resistance protein of cog3793 (zp_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3) 10-Oxohexadecanoic acid (OHA) bound to the structure of Ferredoxin-like Protein (JCVI_PEP_1096682647733) from an environmental metagenome (Unidentified Marine Microbe) (2OD6) FK9436A (2OH1) Acetyltransferase Gnat family FB8805A (2Q9K) Unknown protein Unknown Ligands (UNL) Autoindex Integrate Solve Trace Scale 1.Screen Crystals and Collect Data 2.Automatically Process Data 3.Refine and Evaluate Structures 4.Disseminate Information* Publish Web based Tools TOPSPAN (www.topsan.org) Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl ) * in collaboration with BIC The Role of the Structure Determination Core in the JCSG 2 The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers funded by NIGMS as part of the production phase of the Protein Structure Initiative (PSI). More than 2600 structures have been deposited into the PDB by the PSI centers as of 2007, of which the JCSG has contributed over 500 structures. Although the major part of JCSG's resources is dedicated to protein structure determination, we are also making efforts to disseminate information gained from these structures to a larger community of researchers. Here we report the development of a web-based data mining engine (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl) that queries all of the PSI structures based on a variety of search criteria. The main objective is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers. Preliminary analysis indicates that 1515 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 262 of these structures contain 136 unique biological ligands. Interestingly, several of these ligands had not been previously identified in structures in the PDB. In addition, 21 different co-factors have been observed in 210 structures. The Joint Center for Structural Genomics (JCSG) 1 6 Sum m ary ofLigands (1606 structures) Ligands (269 structures;140 differentligands): UNL (70), UNX (22), LLP (6), SIN (6), NDP (6), M A7 (6), NAG (5), PLM (4), UNK (4), GUN (3), A PC (3), SU C (3), BAL (3), GLC (3), PA F (3), A PR (2), GAL (2), NCN (2), CSD (2), SA I (2), CEI (2), BIO (2), HMH (2), SA P (2), GNP (2), 144 (2), NCA (2), G 4P (2), M PO (2), SRT (2), ANP (2), PCP (2), BGC (2), PA J (2), NIG (1), PRP (1), NIO (1), ABF (1), IPR (1), M TA (1), CP (1), M LT (1), DI6 (1), M ED (1), M LZ (1), 5G P (1), CSO (1), CDP (1), I3A (1), 2PL (1), HED (1), G 1P (1), NBZ (1), CSY (1), FRU (1), PLG (1), THF (1), B 1M (1), ACP (1), DU (1), MMZ (1), OHA (1), 16A (1), THT (1), M 7P (1), 3G C (1), CF5 (1), PEO (1), CTZ (1), ADE (1), FT6 (1), KEG (1), LU M (1), XLS (1), BAM (1), ADN (1), PM P (1), ADQ (1), B33 (1), DGI (1), G 3H (1), OXG (1), NDS (1), SA L (1), 3SL (1), SIB (1), STH (1), FEO (1), G 3P (1), OXN (1), FES (1), TYD (1), DGT (1), 8PP (1), CO2 (1), M P5 (1), NTM (1), PN S (1), AES (1), A PK (1), UVW (1), TRE (1), PY R (1), NAI (1), TCL (1), NMN (1), M AN (1), BFD (1), HHP (1), RIP (1), RBF (1), ORO (1), SN N (1), DTP (1), ZID (1), DEP (1), U PG (1), HXA (1), AAT (1), DTY (1), DON (1), N PO (1), C2E (1), AGC (1), BDF (1), PH T (1), O SB (1), NVA (1), CRO (1), BDN (1), TNE (1), SO G (1), AGS (1), TLP (1), 1PS (1), DUT (1), CXS (1), GEQ (1), M RD (1), G 6P (1) C o-factors (211 structures;21 differentco-factors): FM N (36), NAD (29), COA (18), NAP (17), PLP (15), ADP (15), FA D (15), SA M (14), ATP (9), SA H (9), AMP (9), HEM (8), ACO (7), GDP (4), FS4 (3), U 5P (2), M LC (1), COD (1), CNC (1), UTP (1), CTP (1) M etalIons (647 structures;30 differentm etalions): MG (177), ZN (174), NA (102), CA (83), NI (40), MN (31), FE (26), K (16), FE2 (9), CD (8), PT (8), HG (7), CO (5), SM (2), W O4 (2), PR (2), AU (2), BA (1), CS (1), MW2 (1), SE (1), ARS (1), ZN3 (1), O 4M (1), YT3 (1), LI (1), MO2 (1), MO3 (1), VO4 (1), MO6 (1) N on-m etalIons (692 structures;22 differentnon-m etalions):SO 4 (324), CL (243), PO 4 (118), NO3 (11), IOD (10), BR (10), SCN (8), CO3 (4), CAC (4), PO P (3), AZI (3), SU L (2), BCT (2), ALF (2), OXL (2), PER (1), SO 3 (1), M LI (1), PO 3 (1), THJ (1), 1A L (1), NH4 (1) O rganics (90 structures;26 differentorganics): IPA (14), EOH (13), BM E (9), BEZ (5), TLA (5), SEO (5), AKG (5), ETX (4), TAR (4), PG O (4), DTT (4), OAA (2), ACE (2), DMS (2), M LA (1), DOX (1), XYL (1), MOH (1), 3O H (1), AZ1 (1), PPI (1), IOH (1), FO R (1), MYR (1), GTT (1), LM T (1) Buffers (240 structures;15 differentbuffers): ACT (86), ACY (47), FM T (37), CIT (27), TRS (16), EPE (15), M ES (12), IMD (8), TM N (2), 10A (2), BTB (2), ICT (1), CPS (1), FLC (1), NHE (1) Precipitants (98 structures;13 differentprecipitants): PEG (38), PG 4 (28), PG E (16), 1PE (8), P6G (7), 2PE (3), PE4 (3), P33 (3), PE5 (2), PEF (1), BU3 (1), 1PG (1), PE8 (1) Salts (3 structures;3 differentsalts): D PO (1), A F3 (1), PPC (1) D etergents (2 structures;1 differentdetergents): BOG (2) C ryos (502 structures;5 differentcryos): GOL (244), EDO (241), M PD (32), EGL (3), CRY (2) 5 Search Results (35 hits) ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 Ligand Depot: ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 HIC-Up: Ligand Visualization Links JCSG FMN UNL Archaeoglo bus Fulgidus Dsm 4304 Crystal Structure of Hypothetical Protein (NP_068944.1) from Archaeoglobus Fulgidus at 1.30 Å resolution NP_068944. 1 PF0898 1 1vp 8 TB0885A 3 5 CESG FMN Arabidopsi s Thaliana 12-0xo-Phytodienoate Reductase Isoform 3 NP_178662. 1 PF0072 4 1q4 5 SGT9848 0 3 4 . JCSG FMN GOL SO4 Jannaschia Sp. Ccs1 Crystal Structure of Pyridoxamine 5'-phosphate Oxidase- Related FMN-binding (YP_508196.1) From Jannaschia Sp. Ccs1 at 1.60 Å resolution YP_508196. 1 PF0124 3 2ou 5 FJ9446A 3 JCSG EDO FMN SO4 UNL Clostridiu m Acetobutyl icum Crystal Structure of NIMC/NIMA Family Protein (NP_349178.1) from Clostridium Acetobutylicum at 1.80 Å resolution NP_349178. 1 PF0124 3 2ig 6 FH7614A 2 JCSG EDO FMN NCA Pyrococcus Horikoshii Ot3 Crystal Structure of FMN- binding Protein (NP_142786.1) from Pyrococcus Horikoshii at 1.35 Å resolution NP_142786. 1 PF0161 3 2r6 v FB10607 B 1 PSI Ligands Organism Description Accession PFAM PDB Target N A typical Search Result 4 Examples of Search Queries Ligands bound to JCSG new folds 10 Target PDB D escription O rganism Ligand C L6107A 2ICH Putative ATTH (N P_841447.1)at2.00 A N itrosom onas Europaea NHE TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at2.6 A C lostridium Acetobutylicum 3SL TM 0160 1VJL Predicted Protein related to W ound Inducive P roteins in Plants at1.90 A Therm otoga M aritim a UNL TM 0449 1KQ 4 Thy1-com plem enting Protein at2.25 A Therm otoga M aritim a FAD TM 0574 1VKY S-adenosylm ethionine Trna R ibosyltransferase at2.00 A Therm otoga M aritim a UNL TM 1394 1VQ 0 33 kD a C haperonin (heatShock Protein 33 H om olog)at2.20 A Therm otoga M aritim a UNL TM 1464 1VKM C onserved H ypothetical P rotein P ossibly Involved in C arbohydrate M etabolism at1.90 A Therm otoga M aritim a M sb8 UNL TM 1506 1VK9 H ypothetical Protein at2.70 A Therm otoga M aritim a UNL TM 1553 1VRM H ypothetical Protein at1.58 A Therm otoga M aritim a M sb8 UNL 2ICH 1VQ0 1VR0 1VJL 1KQ4 1VKY 1VRM 1VK9 1VKM 9 out of 26 new fold structures from JCSG have bound ligands, which identify their active sites and give some clues to function. Often the ligands are modeled as UNL, because their precise identity is unknown. Distribution of Ligands 7 0 10 20 30 40 50 60 70 80 UNL NDP UNK BAL GAL CEI GNP MPO BGC N IO Ligands 0 5 10 15 20 25 30 35 40 FM N NAD COA NAP P LP ADP FAD SAM ATP SAH AMP HEM ACO GDP FS4 U 5P M LC COD CNC UTP CTP Co-factors 0 20 40 60 80 100 120 140 160 180 200 MG CA FE CD CO PR CS ARS Y T3 MO3 Metal Ions 0 50 100 150 200 250 300 350 SO4 PO 4 IOD SCN CAC AZI BCT OXL SO3 PO 3 1A L Non-metal Ions 0 10 20 30 40 50 60 70 80 90 100 ACT FM T TRS M ES TM N BTB CPS NHE Buffers 0 5 10 15 20 25 30 35 40 PEG PG4 PGE 1P E P 6G 2P E PE4 P 33 PE5 PEF BU3 1P G PE8 Precipitants NDP GAL MPO FMN PLP FS4 Exploring Binding Modes of Ligands 11 There are over 340 structures in PDB with the co-factor Flavin Mononucleotide (FMN) bound to the protein The binding poses of FMN display considerable variations due to the torsional flexibility in the molecule. However, unique binding poses can be observed in proteins belonging to specific PFAM families. Number of Structures 8 8 0 PF0107 0 9 8 1 PF0118 0 9 7 2 PF0161 3 10 8 2 PF0072 4 16 13 3 PF0025 8 17 8 9 PF0088 1 21 14 7 PF0124 3 Tot al Non- PSI PS I PFAM PF01243 (Pyridox_oxidas e ) PF01180 (DHOdehase ) PF00881 (Nitroreduct ase) PF00258 (Flavodoxin _1) PF00724 (Oxidored_FMN ) PF01613 (Flavin reductase like) PF01070 (FMN-dependent dehydrogenase ) UCSD & Burnham (Bioinformatics Core) John Wooley Adam Godzik Slawomir Grzechnik Lukasz Jaroszewski Dana Weekes Lian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr Kozbial Andrew Morse Prasad Burra Tamara Astakhova Josie Alaoen Cindy Cook TSRI (NMR Core) Kurt Wüthrich Reto Horst Maggie Johnson Amaranth Chat terj ee Michael Geralt Wojtek Augu styn iak Pedro Serrano Bill Pedrini William Placzek Stanford /SSRL (Structure Determination Core) Keith Hodgson Ashley Deacon Mitchell Miller Debanu Das Hsiu-Ju (Jessica) Chiu Kevin Jin Christopher Rife Qingping Xu Silvya Oommachen Scott Talafuse Henry van den Bedem Ronald Reyes Christine Trame Scientific Advisory Board Sir Tom Blundell Robert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San Francisco James Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research Institute Soichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and Proteomics James Wells UC San Francisco The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH. GNF & TSRI (Crystallomics Core) Scott Lesley Mark Knuth Dennis Carlton Thomas Clayton Kevin D. Murphy Christina Trout Marc Deller Daniel McMullan Heath Klock Polat Abdubek Claire Acosta Linda M. Columbus Julie Feuerhelm Joanna C. Hale Thamara Janaratne Hope Johnson Linda Okach Edward Nigoghossian Sebastian Sudek Aprilfawn White Bernhard Geierstanger Glen Spraggon Ylva Elias Sanjay Agarwalla Charlene Cho Bi-Ying Yeh Anna Grzechnik Jessica Canseco Mimmi Brown TSRI (Admin C o r e ) Ian Wilson Marc E l s l i g e r Gye Won Han David M

Ligand search and data mining of Structural Genomics structures

  • Upload
    kale

  • View
    37

  • Download
    2

Embed Size (px)

DESCRIPTION

Ligand search and data mining of Structural Genomics structures. Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA. Distribution of Ligands. - PowerPoint PPT Presentation

Citation preview

Page 1: Ligand search and data mining of  Structural Genomics structures

Ligand search and data mining of Ligand search and data mining of Structural Genomics structuresStructural Genomics structures

Abhinav Kumar, Herbert Axelrod, Ashley DeaconAbhinav Kumar, Herbert Axelrod, Ashley Deacon

Structure Determination Core, Joint Center for Structural Genomics (JCSG), Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USAStanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA

JCSG Ligand Search3

Unique PSI Ligands8

PDB Ligand Name Ligand PSI2A3L Coformycin 5'-Phosphate CF5 CESG2OU3 1H-Indole-3-Carbaldehyde I3A JCSG1VR0 (2R)-3-Sulfolactic Acid 3SL JCSG2OD6 10-Oxohexadecanoic Acid OHA JCSG1X92 D-Glycero-D-Mannopyranose-7-Phosphate M7P MCSG1O8B Beta-D-Arabinofuranose-5'-Phosphate ABF MCSG2OSU 6-Diazenyl-5-Oxo-L-Norleucine DON MCSG1M33 3-Hydroxy-Propanoic Acid 3OH MCSG1RTW (4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate MP5 NESG2NW9 6-Fluoro-L-Tryptophan FT6 NESG1XKL 2-Amino-4H-1,3-Benzoxathiin-4-Ol STH NESG1LW4 3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid TLP NYSGXRC2B4B N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine B33 NYSGXRC1TUF Azelaic Acid AZ1 NYSGXRC2PUZ N-(Iminomethyl)-L-Glutamic Acid NIG NYSGXRC2Q09 3-[(4S)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid DI6 NYSGXRC2GVC 1-Methyl-1,3-Dihydro-2H-Imidazole-2-Thione MMZ NYSGXRC1Y0G 2-[(2E,6E,10E,14E,18E,22E,26E)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol 8PP NYSGXRC1Z2L Allantoate Ion 1AL NYSGXRC1Y80 Co-5-Methoxybenzimidazolylcobamide B1M SECSG1KPH Didecyl-Dimethyl-Ammonium 10A TBSGC1KPI Didecyl-Dimethyl-Ammonium 10A TBSGC1N2H Pantoyl Adenylate PAJ TBSGC1N2I Pantoyl Adenylate PAJ TBSGC1BVR Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester THT TBSGC1QPR 5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate PPC TBSGC1P44 5-{[4-(9H-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1H-Indole GEQ TBSGC

Unique Ligands9

(R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the structure of putative

2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0)

Indole-3-Carboxaldehyde (I3A) bound to the structure of tellurite resistance

protein of cog3793 (zp_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3)

10-Oxohexadecanoic acid (OHA) bound to the structure of Ferredoxin-like

Protein (JCVI_PEP_1096682647733) from an environmental metagenome

(Unidentified Marine Microbe) (2OD6)

FK9436A (2OH1)Acetyltransferase Gnat family

FB8805A (2Q9K)Unknown protein

Unknown Ligands (UNL)

Autoindex Integrate Solve TraceScale

1. Screen Crystals and Collect Data

2. Automatically Process Data

3. Refine and Evaluate Structures

4. Disseminate Information* Publish Web based Tools

TOPSPAN (www.topsan.org) Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl)

* in collaboration with BIC

The Role of the Structure Determination Core in the JCSG2

The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers funded by NIGMS as part of the production phase of the Protein Structure Initiative (PSI). More than 2600 structures have been deposited into the PDB by the PSI centers as of 2007, of which the JCSG has contributed over 500 structures. Although the major part of JCSG's resources is dedicated to protein structure determination, we are also making efforts to disseminate information gained from these structures to a larger community of researchers. Here we report the development of a web-based data mining engine (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl) that queries all of the PSI structures based on a variety of search criteria. The main objective is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers. Preliminary analysis indicates that 1515 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 262 of these structures contain 136 unique biological ligands. Interestingly, several of these ligands had not been previously identified in structures in the PDB. In addition, 21 different co-factors have been observed in 210 structures.

The Joint Center for Structural Genomics (JCSG)1

6Summary of Ligands (1606 structures)

Ligands (269 structures; 140 different ligands): UNL(70), UNX(22), LLP(6), SIN(6), NDP(6), MA7(6), NAG(5), PLM(4), UNK(4), GUN(3), APC(3), SUC(3), BAL(3), GLC(3), PAF(3), APR(2), GAL(2), NCN(2), CSD(2), SAI(2), CEI(2), BIO(2), HMH(2), SAP(2), GNP(2), 144(2), NCA(2), G4P(2), MPO(2), SRT(2), ANP(2), PCP(2), BGC(2), PAJ(2), NIG(1), PRP(1), NIO(1), ABF(1), IPR(1), MTA(1), CP(1), MLT(1), DI6(1), MED(1), MLZ(1), 5GP(1), CSO(1), CDP(1), I3A(1), 2PL(1), HED(1), G1P(1), NBZ(1), CSY(1), FRU(1), PLG(1), THF(1), B1M(1), ACP(1), DU(1), MMZ(1), OHA(1), 16A(1), THT(1), M7P(1), 3GC(1), CF5(1), PEO(1), CTZ(1), ADE(1), FT6(1), KEG(1), LUM(1), XLS(1), BAM(1), ADN(1), PMP(1), ADQ(1), B33(1), DGI(1), G3H(1), OXG(1), NDS(1), SAL(1), 3SL(1), SIB(1), STH(1), FEO(1), G3P(1), OXN(1), FES(1), TYD(1), DGT(1), 8PP(1), CO2(1), MP5(1), NTM(1), PNS(1), AES(1), APK(1), UVW(1), TRE(1), PYR(1), NAI(1), TCL(1), NMN(1), MAN(1), BFD(1), HHP(1), RIP(1), RBF(1), ORO(1), SNN(1), DTP(1), ZID(1), DEP(1), UPG(1), HXA(1), AAT(1), DTY(1), DON(1), NPO(1), C2E(1), AGC(1), BDF(1), PHT(1), OSB(1), NVA(1), CRO(1), BDN(1), TNE(1), SOG(1), AGS(1), TLP(1), 1PS(1), DUT(1), CXS(1), GEQ(1), MRD(1), G6P(1)

Co-factors (211 structures; 21 different co-factors): FMN(36), NAD(29), COA(18), NAP(17), PLP(15), ADP(15), FAD(15), SAM(14), ATP(9), SAH(9), AMP(9), HEM(8), ACO(7), GDP(4), FS4(3), U5P(2), MLC(1), COD(1), CNC(1), UTP(1), CTP(1)

Metal Ions (647 structures; 30 different metal ions): MG(177), ZN(174), NA(102), CA(83), NI(40), MN(31), FE(26), K(16), FE2(9), CD(8), PT(8), HG(7), CO(5), SM(2), WO4(2), PR(2), AU(2), BA(1), CS(1), MW2(1), SE(1), ARS(1), ZN3(1), O4M(1), YT3(1), LI(1), MO2(1), MO3(1), VO4(1), MO6(1)

Non-metal Ions (692 structures; 22 different non-metal ions): SO4(324), CL(243), PO4(118), NO3(11), IOD(10), BR(10), SCN(8), CO3(4), CAC(4), POP(3), AZI(3), SUL(2), BCT(2), ALF(2), OXL(2), PER(1), SO3(1), MLI(1), PO3(1), THJ(1), 1AL(1), NH4(1)

Organics (90 structures; 26 different organics): IPA(14), EOH(13), BME(9), BEZ(5), TLA(5), SEO(5), AKG(5), ETX(4), TAR(4), PGO(4), DTT(4), OAA(2), ACE(2), DMS(2), MLA(1), DOX(1), XYL(1), MOH(1), 3OH(1), AZ1(1), PPI(1), IOH(1), FOR(1), MYR(1), GTT(1), LMT(1)

Buffers (240 structures; 15 different buffers): ACT(86), ACY(47), FMT(37), CIT(27), TRS(16), EPE(15), MES(12), IMD(8), TMN(2), 10A(2), BTB(2), ICT(1), CPS(1), FLC(1), NHE(1)

Precipitants (98 structures; 13 different precipitants): PEG(38), PG4(28), PGE(16), 1PE(8), P6G(7), 2PE(3), PE4(3), P33(3), PE5(2), PEF(1), BU3(1), 1PG(1), PE8(1)

Salts (3 structures; 3 different salts): DPO(1), AF3(1), PPC(1)

Detergents (2 structures; 1 different detergents): BOG(2)

Cryos (502 structures; 5 different cryos): GOL(244), EDO(241), MPD(32), EGL(3), CRY(2)

5

Search Results (35 hits)

ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 Ligand Depot:

ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 HIC-Up:

Ligand Visualization Links

JCSGFMN UNL

Archaeoglobus Fulgidus Dsm 4304

Crystal Structure of Hypothetical Protein (NP_068944.1) from Archaeoglobus Fulgidus at 1.30 Å resolution

NP_068944.1PF089811vp8TB0885A35

CESGFMNArabidopsis Thaliana

12-0xo-Phytodienoate Reductase Isoform 3NP_178662.1PF007241q45SGT9848034

…………………….

JCSGFMN GOL SO4

Jannaschia Sp. Ccs1

Crystal Structure of Pyridoxamine 5'-phosphate Oxidase- Related FMN-binding (YP_508196.1) From Jannaschia Sp. Ccs1 at 1.60 Å resolution

YP_508196.1PF012432ou5FJ9446A3

JCSGEDO FMN SO4 UNL

Clostridium Acetobutylicum

Crystal Structure of NIMC/NIMA Family Protein (NP_349178.1) from Clostridium Acetobutylicum at 1.80 Å resolution

NP_349178.1PF012432ig6FH7614A2

JCSGEDO FMN NCA

Pyrococcus Horikoshii Ot3

Crystal Structure of FMN-binding Protein (NP_142786.1) from Pyrococcus Horikoshii at 1.35 Å resolution

NP_142786.1PF016132r6vFB10607B1

PSILigandsOrganismDescriptionAccessionPFAMPDBTargetN

A typical Search Result

4 Examples of Search Queries

Ligands bound to JCSG new folds10

Target PDB Description Organism Ligand

CL6107A 2ICH Putative ATTH (NP_841447.1) at 2.00 A Nitrosomonas Europaea NHE

TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at 2.6 A Clostridium Acetobutylicum 3SL

TM0160 1VJL Predicted Protein related to Wound Inducive Proteins in Plants at 1.90 A Thermotoga Maritima UNL

TM0449 1KQ4 Thy1-complementing Protein at 2.25 A Thermotoga Maritima FAD

TM0574 1VKY S-adenosylmethionine Trna Ribosyltransferase at 2.00 A Thermotoga Maritima UNL

TM1394 1VQ0 33 kDa Chaperonin (heat Shock Protein 33 Homolog) at 2.20 A Thermotoga Maritima UNL

TM1464 1VKM Conserved Hypothetical Protein Possibly Involved in Carbohydrate Metabolism at 1.90 A Thermotoga Maritima Msb8 UNL

TM1506 1VK9 Hypothetical Protein at 2.70 A Thermotoga Maritima UNL

TM1553 1VRM Hypothetical Protein at 1.58 A Thermotoga Maritima Msb8 UNL

2ICH

1VQ0

1VR0 1VJL 1KQ4

1VKY

1VRM

1VK91VKM

9 out of 26 new fold structures from JCSG have bound ligands, which identify their active sites and give some clues to function. Often the ligands are modeled as UNL, because their precise identity is unknown.

Distribution of Ligands7

0

10

20

30

40

50

60

70

80

UNL NDP UNK BAL GAL CEI GNP MPO BGC NIO

Ligands

0

5

10

15

20

25

30

35

40

FM

N

NA

D

CO

A

NA

P

PLP

AD

P

FA

D

SA

M

AT

P

SA

H

AM

P

HE

M

AC

O

GD

P

FS

4

U5P

MLC

CO

D

CN

C

UT

P

CT

P

Co-factors

0

20

40

60

80

100

120

140

160

180

200

MG CA FE CD CO PR CS ARS YT3 MO3

Metal Ions

0

50

100

150

200

250

300

350

SO4 PO4 IOD SCN CAC AZI BCT OXL SO3 PO3 1AL

Non-metal Ions

0

10

20

30

40

50

60

70

80

90

100

ACT FMT TRS MES TMN BTB CPS NHE

Buffers

0

5

10

15

20

25

30

35

40

PEG PG4 PGE 1PE P6G 2PE PE4 P33 PE5 PEF BU3 1PG PE8

Precipitants

NDP

GAL

MPO

FMN

PLP FS4

Exploring Binding Modes of Ligands11

There are over 340 structures in PDB with the co-factor Flavin Mononucleotide (FMN) bound to the protein

The binding poses of FMN display considerable variations due to the torsional flexibility in the molecule.

However, unique binding poses can be observed in proteins belonging to specific PFAM families.

Number of Structures

880PF01070

981PF01180

972PF01613

1082PF00724

16133PF00258

1789PF00881

21147PF01243

TotalNon-PSIPSIPFAM

PF01243 (Pyridox_oxidase )

PF01180 (DHOdehase )

PF00881 (Nitroreductase)

PF00258 (Flavodoxin _1)

PF00724 (Oxidored_FMN )

PF01613 (Flavin reductase like)

PF01070 (FMN-dependent dehydrogenase )

UCSD & Burnham(Bioinformatics Core)

John WooleyAdam Godzik Slawomir Grzechnik Lukasz Jaroszewski Dana WeekesLian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr KozbialAndrew Morse Prasad BurraTamara Astakhova Josie AlaoenCindy Cook

TSRI(NMR Core)

Kurt Wüthrich Reto Horst Maggie JohnsonAmaranth

Chatterjee

Michael GeraltWojtek AugustyniakPedro SerranoBill PedriniWilliam Placzek

Stanford /SSRL(Structure Determination Core)

Keith Hodgson Ashley DeaconMitchell Miller Debanu DasHsiu-Ju (Jessica) Chiu Kevin JinChristopher Rife Qingping XuSilvya Oommachen Scott TalafuseHenry van den Bedem Ronald Reyes Christine Trame

Scientific Advisory BoardSir Tom Blundell Robert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San FranciscoJames Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research InstituteSoichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and ProteomicsJames Wells UC San Francisco

The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH.

GNF & TSRI (Crystallomics Core)

Scott Lesley Mark Knuth Dennis CarltonThomas Clayton Kevin D. Murphy Christina TroutMarc Deller Daniel McMullan Heath Klock Polat Abdubek Claire Acosta Linda M. ColumbusJulie Feuerhelm Joanna C. Hale Thamara JanaratneHope Johnson Linda Okach Edward NigoghossianSebastian Sudek Aprilfawn White Bernhard GeierstangerGlen Spraggon Ylva Elias Sanjay AgarwallaCharlene Cho Bi-Ying Yeh Anna GrzechnikJessica Canseco Mimmi Brown

TSRI(Admin Core)Ian WilsonMarc ElsligerGye Won HanDavid MarcianoHenry TienXiaoping DaiLisa van Veen