Upload
kale
View
37
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Ligand search and data mining of Structural Genomics structures. Abhinav Kumar, Herbert Axelrod, Ashley Deacon Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA. Distribution of Ligands. - PowerPoint PPT Presentation
Citation preview
Ligand search and data mining of Ligand search and data mining of Structural Genomics structuresStructural Genomics structures
Abhinav Kumar, Herbert Axelrod, Ashley DeaconAbhinav Kumar, Herbert Axelrod, Ashley Deacon
Structure Determination Core, Joint Center for Structural Genomics (JCSG), Structure Determination Core, Joint Center for Structural Genomics (JCSG), Stanford Synchrotron Radiation Laboratory, Menlo Park, CA, USAStanford Synchrotron Radiation Laboratory, Menlo Park, CA, USA
JCSG Ligand Search3
Unique PSI Ligands8
PDB Ligand Name Ligand PSI2A3L Coformycin 5'-Phosphate CF5 CESG2OU3 1H-Indole-3-Carbaldehyde I3A JCSG1VR0 (2R)-3-Sulfolactic Acid 3SL JCSG2OD6 10-Oxohexadecanoic Acid OHA JCSG1X92 D-Glycero-D-Mannopyranose-7-Phosphate M7P MCSG1O8B Beta-D-Arabinofuranose-5'-Phosphate ABF MCSG2OSU 6-Diazenyl-5-Oxo-L-Norleucine DON MCSG1M33 3-Hydroxy-Propanoic Acid 3OH MCSG1RTW (4-Amino-2-Methylpyrimidin-5-Yl)Methyl Dihydrogen Phosphate MP5 NESG2NW9 6-Fluoro-L-Tryptophan FT6 NESG1XKL 2-Amino-4H-1,3-Benzoxathiin-4-Ol STH NESG1LW4 3-Hydroxy-2-[(3-Hydroxy-2-Methyl-5-Phosphonooxymethyl- Pyridin-4-Ylmethyl)-Amino]-Butyric Acid TLP NYSGXRC2B4B N-Ethyl-N-[3-(Propylamino)Propyl]Propane- 1,3-Diamine B33 NYSGXRC1TUF Azelaic Acid AZ1 NYSGXRC2PUZ N-(Iminomethyl)-L-Glutamic Acid NIG NYSGXRC2Q09 3-[(4S)-2,5-Dioxoimidazolidin-4-Yl]Propanoic Acid DI6 NYSGXRC2GVC 1-Methyl-1,3-Dihydro-2H-Imidazole-2-Thione MMZ NYSGXRC1Y0G 2-[(2E,6E,10E,14E,18E,22E,26E)-3,7,11,15,19,23,27,31- Octamethyldotriaconta-2,6,10,14,18,22,26,30- Octaenyl]Phenol 8PP NYSGXRC1Z2L Allantoate Ion 1AL NYSGXRC1Y80 Co-5-Methoxybenzimidazolylcobamide B1M SECSG1KPH Didecyl-Dimethyl-Ammonium 10A TBSGC1KPI Didecyl-Dimethyl-Ammonium 10A TBSGC1N2H Pantoyl Adenylate PAJ TBSGC1N2I Pantoyl Adenylate PAJ TBSGC1BVR Trans-2-Hexadecenoyl-(N-Acetyl-Cysteamine)- Thioester THT TBSGC1QPR 5-Phosphoribosyl-1-(Beta-Methylene) Pyrophosphate PPC TBSGC1P44 5-{[4-(9H-Fluoren-9-Yl)Piperazin-1-Yl]Carbonyl}- 1H-Indole GEQ TBSGC
Unique Ligands9
(R)-2-Hydroxy-3-Sulfopropanoic acid (3SL) bound to the structure of putative
2-phosphosulfolactatetitle 2 phosphatase from Clostridium Acetobutylicum (1VR0)
Indole-3-Carboxaldehyde (I3A) bound to the structure of tellurite resistance
protein of cog3793 (zp_00109916.1) from Nostoc Punctiforme PCC 73102 (2OU3)
10-Oxohexadecanoic acid (OHA) bound to the structure of Ferredoxin-like
Protein (JCVI_PEP_1096682647733) from an environmental metagenome
(Unidentified Marine Microbe) (2OD6)
FK9436A (2OH1)Acetyltransferase Gnat family
FB8805A (2Q9K)Unknown protein
Unknown Ligands (UNL)
Autoindex Integrate Solve TraceScale
1. Screen Crystals and Collect Data
2. Automatically Process Data
3. Refine and Evaluate Structures
4. Disseminate Information* Publish Web based Tools
TOPSPAN (www.topsan.org) Ligand Search (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl)
* in collaboration with BIC
The Role of the Structure Determination Core in the JCSG2
The JCSG (www.jcsg.org) is one of the four large-scale structural genomics centers funded by NIGMS as part of the production phase of the Protein Structure Initiative (PSI). More than 2600 structures have been deposited into the PDB by the PSI centers as of 2007, of which the JCSG has contributed over 500 structures. Although the major part of JCSG's resources is dedicated to protein structure determination, we are also making efforts to disseminate information gained from these structures to a larger community of researchers. Here we report the development of a web-based data mining engine (smb.slac.stanford.edu/public/jcsg/cgi/jcsg_ligand_check.pl) that queries all of the PSI structures based on a variety of search criteria. The main objective is to extract ligands, biological or otherwise, bound to the structures, and to explore them further with a number of associated links. In addition, the structures can be queried by a host of other criteria, such as target names, PDB IDs, PFAM family names, structure descriptions, organisms, and PSI centers. Preliminary analysis indicates that 1515 of these PSI structures have some type of bound ligand, metal or solvent molecules, and 262 of these structures contain 136 unique biological ligands. Interestingly, several of these ligands had not been previously identified in structures in the PDB. In addition, 21 different co-factors have been observed in 210 structures.
The Joint Center for Structural Genomics (JCSG)1
6Summary of Ligands (1606 structures)
Ligands (269 structures; 140 different ligands): UNL(70), UNX(22), LLP(6), SIN(6), NDP(6), MA7(6), NAG(5), PLM(4), UNK(4), GUN(3), APC(3), SUC(3), BAL(3), GLC(3), PAF(3), APR(2), GAL(2), NCN(2), CSD(2), SAI(2), CEI(2), BIO(2), HMH(2), SAP(2), GNP(2), 144(2), NCA(2), G4P(2), MPO(2), SRT(2), ANP(2), PCP(2), BGC(2), PAJ(2), NIG(1), PRP(1), NIO(1), ABF(1), IPR(1), MTA(1), CP(1), MLT(1), DI6(1), MED(1), MLZ(1), 5GP(1), CSO(1), CDP(1), I3A(1), 2PL(1), HED(1), G1P(1), NBZ(1), CSY(1), FRU(1), PLG(1), THF(1), B1M(1), ACP(1), DU(1), MMZ(1), OHA(1), 16A(1), THT(1), M7P(1), 3GC(1), CF5(1), PEO(1), CTZ(1), ADE(1), FT6(1), KEG(1), LUM(1), XLS(1), BAM(1), ADN(1), PMP(1), ADQ(1), B33(1), DGI(1), G3H(1), OXG(1), NDS(1), SAL(1), 3SL(1), SIB(1), STH(1), FEO(1), G3P(1), OXN(1), FES(1), TYD(1), DGT(1), 8PP(1), CO2(1), MP5(1), NTM(1), PNS(1), AES(1), APK(1), UVW(1), TRE(1), PYR(1), NAI(1), TCL(1), NMN(1), MAN(1), BFD(1), HHP(1), RIP(1), RBF(1), ORO(1), SNN(1), DTP(1), ZID(1), DEP(1), UPG(1), HXA(1), AAT(1), DTY(1), DON(1), NPO(1), C2E(1), AGC(1), BDF(1), PHT(1), OSB(1), NVA(1), CRO(1), BDN(1), TNE(1), SOG(1), AGS(1), TLP(1), 1PS(1), DUT(1), CXS(1), GEQ(1), MRD(1), G6P(1)
Co-factors (211 structures; 21 different co-factors): FMN(36), NAD(29), COA(18), NAP(17), PLP(15), ADP(15), FAD(15), SAM(14), ATP(9), SAH(9), AMP(9), HEM(8), ACO(7), GDP(4), FS4(3), U5P(2), MLC(1), COD(1), CNC(1), UTP(1), CTP(1)
Metal Ions (647 structures; 30 different metal ions): MG(177), ZN(174), NA(102), CA(83), NI(40), MN(31), FE(26), K(16), FE2(9), CD(8), PT(8), HG(7), CO(5), SM(2), WO4(2), PR(2), AU(2), BA(1), CS(1), MW2(1), SE(1), ARS(1), ZN3(1), O4M(1), YT3(1), LI(1), MO2(1), MO3(1), VO4(1), MO6(1)
Non-metal Ions (692 structures; 22 different non-metal ions): SO4(324), CL(243), PO4(118), NO3(11), IOD(10), BR(10), SCN(8), CO3(4), CAC(4), POP(3), AZI(3), SUL(2), BCT(2), ALF(2), OXL(2), PER(1), SO3(1), MLI(1), PO3(1), THJ(1), 1AL(1), NH4(1)
Organics (90 structures; 26 different organics): IPA(14), EOH(13), BME(9), BEZ(5), TLA(5), SEO(5), AKG(5), ETX(4), TAR(4), PGO(4), DTT(4), OAA(2), ACE(2), DMS(2), MLA(1), DOX(1), XYL(1), MOH(1), 3OH(1), AZ1(1), PPI(1), IOH(1), FOR(1), MYR(1), GTT(1), LMT(1)
Buffers (240 structures; 15 different buffers): ACT(86), ACY(47), FMT(37), CIT(27), TRS(16), EPE(15), MES(12), IMD(8), TMN(2), 10A(2), BTB(2), ICT(1), CPS(1), FLC(1), NHE(1)
Precipitants (98 structures; 13 different precipitants): PEG(38), PG4(28), PGE(16), 1PE(8), P6G(7), 2PE(3), PE4(3), P33(3), PE5(2), PEF(1), BU3(1), 1PG(1), PE8(1)
Salts (3 structures; 3 different salts): DPO(1), AF3(1), PPC(1)
Detergents (2 structures; 1 different detergents): BOG(2)
Cryos (502 structures; 5 different cryos): GOL(244), EDO(241), MPD(32), EGL(3), CRY(2)
5
Search Results (35 hits)
ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 Ligand Depot:
ACY ADP AMP BR CA CL EDO FMN GLC GOL IOD MG NCA NI ORO P33 PO4 SO4 HIC-Up:
Ligand Visualization Links
JCSGFMN UNL
Archaeoglobus Fulgidus Dsm 4304
Crystal Structure of Hypothetical Protein (NP_068944.1) from Archaeoglobus Fulgidus at 1.30 Å resolution
NP_068944.1PF089811vp8TB0885A35
CESGFMNArabidopsis Thaliana
12-0xo-Phytodienoate Reductase Isoform 3NP_178662.1PF007241q45SGT9848034
…………………….
JCSGFMN GOL SO4
Jannaschia Sp. Ccs1
Crystal Structure of Pyridoxamine 5'-phosphate Oxidase- Related FMN-binding (YP_508196.1) From Jannaschia Sp. Ccs1 at 1.60 Å resolution
YP_508196.1PF012432ou5FJ9446A3
JCSGEDO FMN SO4 UNL
Clostridium Acetobutylicum
Crystal Structure of NIMC/NIMA Family Protein (NP_349178.1) from Clostridium Acetobutylicum at 1.80 Å resolution
NP_349178.1PF012432ig6FH7614A2
JCSGEDO FMN NCA
Pyrococcus Horikoshii Ot3
Crystal Structure of FMN-binding Protein (NP_142786.1) from Pyrococcus Horikoshii at 1.35 Å resolution
NP_142786.1PF016132r6vFB10607B1
PSILigandsOrganismDescriptionAccessionPFAMPDBTargetN
A typical Search Result
4 Examples of Search Queries
Ligands bound to JCSG new folds10
Target PDB Description Organism Ligand
CL6107A 2ICH Putative ATTH (NP_841447.1) at 2.00 A Nitrosomonas Europaea NHE
TB0797A 1VR0 Putative 2-phosphosulfolactate Phosphatase at 2.6 A Clostridium Acetobutylicum 3SL
TM0160 1VJL Predicted Protein related to Wound Inducive Proteins in Plants at 1.90 A Thermotoga Maritima UNL
TM0449 1KQ4 Thy1-complementing Protein at 2.25 A Thermotoga Maritima FAD
TM0574 1VKY S-adenosylmethionine Trna Ribosyltransferase at 2.00 A Thermotoga Maritima UNL
TM1394 1VQ0 33 kDa Chaperonin (heat Shock Protein 33 Homolog) at 2.20 A Thermotoga Maritima UNL
TM1464 1VKM Conserved Hypothetical Protein Possibly Involved in Carbohydrate Metabolism at 1.90 A Thermotoga Maritima Msb8 UNL
TM1506 1VK9 Hypothetical Protein at 2.70 A Thermotoga Maritima UNL
TM1553 1VRM Hypothetical Protein at 1.58 A Thermotoga Maritima Msb8 UNL
2ICH
1VQ0
1VR0 1VJL 1KQ4
1VKY
1VRM
1VK91VKM
9 out of 26 new fold structures from JCSG have bound ligands, which identify their active sites and give some clues to function. Often the ligands are modeled as UNL, because their precise identity is unknown.
Distribution of Ligands7
0
10
20
30
40
50
60
70
80
UNL NDP UNK BAL GAL CEI GNP MPO BGC NIO
Ligands
0
5
10
15
20
25
30
35
40
FM
N
NA
D
CO
A
NA
P
PLP
AD
P
FA
D
SA
M
AT
P
SA
H
AM
P
HE
M
AC
O
GD
P
FS
4
U5P
MLC
CO
D
CN
C
UT
P
CT
P
Co-factors
0
20
40
60
80
100
120
140
160
180
200
MG CA FE CD CO PR CS ARS YT3 MO3
Metal Ions
0
50
100
150
200
250
300
350
SO4 PO4 IOD SCN CAC AZI BCT OXL SO3 PO3 1AL
Non-metal Ions
0
10
20
30
40
50
60
70
80
90
100
ACT FMT TRS MES TMN BTB CPS NHE
Buffers
0
5
10
15
20
25
30
35
40
PEG PG4 PGE 1PE P6G 2PE PE4 P33 PE5 PEF BU3 1PG PE8
Precipitants
NDP
GAL
MPO
FMN
PLP FS4
Exploring Binding Modes of Ligands11
There are over 340 structures in PDB with the co-factor Flavin Mononucleotide (FMN) bound to the protein
The binding poses of FMN display considerable variations due to the torsional flexibility in the molecule.
However, unique binding poses can be observed in proteins belonging to specific PFAM families.
Number of Structures
880PF01070
981PF01180
972PF01613
1082PF00724
16133PF00258
1789PF00881
21147PF01243
TotalNon-PSIPSIPFAM
PF01243 (Pyridox_oxidase )
PF01180 (DHOdehase )
PF00881 (Nitroreductase)
PF00258 (Flavodoxin _1)
PF00724 (Oxidored_FMN )
PF01613 (Flavin reductase like)
PF01070 (FMN-dependent dehydrogenase )
UCSD & Burnham(Bioinformatics Core)
John WooleyAdam Godzik Slawomir Grzechnik Lukasz Jaroszewski Dana WeekesLian Duan Sri Krishna Subramanian Natasha Sefcovic Piotr KozbialAndrew Morse Prasad BurraTamara Astakhova Josie AlaoenCindy Cook
TSRI(NMR Core)
Kurt Wüthrich Reto Horst Maggie JohnsonAmaranth
Chatterjee
Michael GeraltWojtek AugustyniakPedro SerranoBill PedriniWilliam Placzek
Stanford /SSRL(Structure Determination Core)
Keith Hodgson Ashley DeaconMitchell Miller Debanu DasHsiu-Ju (Jessica) Chiu Kevin JinChristopher Rife Qingping XuSilvya Oommachen Scott TalafuseHenry van den Bedem Ronald Reyes Christine Trame
Scientific Advisory BoardSir Tom Blundell Robert Stroud Univ. Cambridge Center for Structure of Membrane Proteins Homme Hellinga Membrane Protein Expression Center Duke University Medical Center UC San FranciscoJames Naismith James Paulson The Scottish Structural Proteomics facility Consortium for Functional Glycomics Univ. St. Andrews The Scripps Research InstituteSoichi Wakatsuki Todd Yeates Photon Factory, KEK, Japan UCLA-DOE Inst. for Genomics and ProteomicsJames Wells UC San Francisco
The JCSG is supported by the NIH Protein Structure Initiative (PSI) Grant U54 GM074898 from NIGMS (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the NIH.
GNF & TSRI (Crystallomics Core)
Scott Lesley Mark Knuth Dennis CarltonThomas Clayton Kevin D. Murphy Christina TroutMarc Deller Daniel McMullan Heath Klock Polat Abdubek Claire Acosta Linda M. ColumbusJulie Feuerhelm Joanna C. Hale Thamara JanaratneHope Johnson Linda Okach Edward NigoghossianSebastian Sudek Aprilfawn White Bernhard GeierstangerGlen Spraggon Ylva Elias Sanjay AgarwallaCharlene Cho Bi-Ying Yeh Anna GrzechnikJessica Canseco Mimmi Brown
TSRI(Admin Core)Ian WilsonMarc ElsligerGye Won HanDavid MarcianoHenry TienXiaoping DaiLisa van Veen