8
1 Informatics and Cyberinfrastructure Collaboratory Aug 16 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org http:// www.chembiogrid.org

Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

  • Upload
    hastin

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Overview of Chemical Informatics and Cyberinfrastructure Collaboratory. Aug 16 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org http://www.chembiogrid.org. - PowerPoint PPT Presentation

Citation preview

Page 1: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

11

Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

Aug 16 2006Geoffrey Fox

Computer Science, Informatics, PhysicsPervasive Technology Laboratories

Indiana University Bloomington IN [email protected]

http://www.infomall.orghttp://www.chembiogrid.org

Page 2: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

22

Capabilities Local Teams, successful Prototypes and International

Collaboration set up in 3 initial major focus areas• Chemical Informatics Cyberinfrastructure/Grids with

services, workflows and demonstration uses building on success in other applications (LEAD) and showing distributed integration of academic and commercial tools

• Computational Chemistry Cyberinfrastructure/Grids with simulation, databases and TeraGrid use

• Education with courses and degrees Review of activities suggest we also formalize work in two further

areas• Chemical Informatics Research – model applicability• Interfacing with the User - bench chemist-friendly portal

Page 3: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

33

Current Status Web site http://www.chembiogrid.org Wiki chosen to support project as a shared editable web space Building Collaboratory involving PubChem – Global Information

System accessible anywhere and at any time – enhance PubChem with distributed tools (clustering, simulation, annotation etc.) and data

Adopted Taverna as workflow as popular in Bioinformatics but we will evaluate other systems such as GPEL from LEAD

Preparing large set of runs on local Big Red 23 Teraflop supercomputer (OSCAR3 CDK Mopac)

Initial results discussed at conferences/workshops/papers• Gordon Conferences, ACS, SDSC tutorial

First new Cheminformatics courses offered Advisory board set up and met Videoconferencing-based meetings with Peter Murray-Rust and group

at Cambridge roughly every 2-3 weeks Good or potentially good interactions with NIH DTP, Scripps, Lilly

and Michigan ECCR

Page 4: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

44

CICC Senior Personnel Geoffrey C. Fox Mu-Hyun (Mookie) Baik Dennis B. Gannon Marlon Pierce Beth A. Plale Gary D. Wiggins David J. Wild Yuqing (Melanie) Wu

Peter T. Cherbas Mehmet M. Dalkilic Charles H. Davis A. Keith Dunker Kelsey M. Forsythe Kevin E. Gilbert John C. Huffman Malika Mahoui Daniel J. Mindiola Santiago D. Schnell William Scott Craig A. Stewart David R. Williams

From Biology, Chemistry, Computer Science, Informatics

at IU Bloomington and IUPUI (Indianapolis)

Page 5: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

55

CICC Advisory Board Alan D. Palkowitz (Eli Lilly) Chris Peterson (Kalypsys) David Spellmeyer (IBM) Dimitris K. Agrafiotis (Johnson & Johnson) Horst Hemmerle (Eli Lilly) James M. Caruthers (Purdue University) Jeremy G. Frey (University of Southampton) Joel Saltz (Ohio State University/University of Maryland/Johns

Hopkins University) John M. Barnard (Digital Chemistry) John Reynders (Eli Lilly) Peter Murray-Rust (University of Cambridge) Peter Willett (University of Sheffield) Thompson Doman (Eli Lilly) Val Gillet (University of Sheffield)

Industry andAcademiaMet October 2005will meet this fall

Page 6: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

6

CICC Combines Grid Computing with Chemical Informatics

CICCCICC CICCCICCChemical Informatics and Cyberinfrastucture CollaboratoryFunded by the National Institutes of Health

www.chembiogrid.org

Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories

Science and Cyberinfrastructure

.

Large Scale Computing ChallengesChemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated.

CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs.

CICC supports the NIH mission by combining state of the art chemical informatics techniques with

• World class high performance computing• National-scale computing resources (TeraGrid)• Internet-standard web services • International activities for service orchestration• Open distributed computing infrastructure for scientists world wide

NIHPubMed

DataBase

OSCARText

Analysis

POVRayParallel

Rendering

Initial 3DStructure

Calculation

ToxicityFiltering

ClusterGrouping Docking

MolecularMechanics

Calculations

Quantum Mechanics

Calculations

IU’sVaruna

DataBase

NIHPubChemDataBase

Chemical informatics text analysis programs can process 100,000’s of abstracts of online journalarticles to extract chemical signatures of potential drugs.

OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential.

Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community.

Page 7: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

CICC Prototype Web Services

Molecular weightsMolecular formulaeTanimoto similarity2D Structure diagramsMolecular descriptors3D structuresInChi generation/searchCMLRSS

Basic cheminformatics

Application based services

Compare (NIH)Toxicity predictions (ToxTree)Literature extraction (OSCAR3)Clustering (BCI Toolkit)Docking, filtering, ... (OpenEye)Varuna simulation

Define WSDL interfaces to enable global production of compatible Web services; refine CML Look at Pipeline Pilot Extend Computational Chemistry (Varuna) Services Routine TeraGrid Big Red use Ready to try “Prototype Production” on OSCAR3 CDK Mopac Develop more training material Link to screening center via Scripps

Next steps?

Key Ideas

Add value to PubChem with additional distributed services and databases Wrapping existing code in web services is not difficult Provide “core” (CDK) services and exemplars of typical tools Provide access to key databases via a web service interface Provide access to major Compute Grids

Page 8: Overview of Chemical Informatics and Cyberinfrastructure Collaboratory

8

Varuna environment for molecular modeling (Baik, IU)

QMDatabase

ResearcherResearcher

Simulation ServiceFORTRAN Code,

Scripts

Chemical Concepts

Experiments

QM/MMDatabasePubChem, PDB,

NCI, etc.

ChemBioGridChemBioGrid

ReactionDB

DB ServiceQueries, Clustering,

Curation, etc.

Papersetc.

Condor

TeraGridSupercomputers

“Flocks”