58
GridChem A Computational Chemistry Cyber- infrastructure Using Web services Sanibel Symposium 23 Feb 07 Sudhakar Pamidighantam NCSA, University of Illinois at Urbana-Champaign [email protected]

GridChem A Computational Chemistry Cyber-infrastructure Using Web services Sanibel Symposium 23 Feb 07 Sudhakar Pamidighantam NCSA, University of Illinois

Embed Size (px)

Citation preview

GridChemA Computational Chemistry

Cyber-infrastructureUsing Web services

Sanibel Symposium 23 Feb 07

Sudhakar PamidighantamNCSA, University of Illinois at

[email protected]

Acknowledgements

Outline

• Historical Background Grid Chemistry

• Current Status Web Services Usage

• Brief Demo

• Future

MotivationSoftware - Reasonably Mature and easy to use to address

chemists questions of interest

Community of Users - Need and capable of using the software Some are non traditional computational chemists

Resources - Various in capacity and capability

Background

Qauntum Chemistry Remote Job Monitor( Quantum Chemistry Workbench)1998, NCSA

Chemviz1999-2001, NSF

TechnologiesWeb Based Client Server ModelsVisual InterfacesDistributed computing

GridChem

NCSA Alliance was commissioned 1998

Diverse HPC systems deployed

both at NCSA and Alliance Partner Sites

Batch schedulers different at sites

Policies favored different classes and modes of

use at different sites/HPC systems

Extended TeraGrid Facility

www.teragrid.org

Grid and Gridlock

Alliance lead to Physical Grid

Grid lead to TeraGrid

Homogenous Grid was planned but it was difficult to keep it homogenous

Things got more complicated and we have heterogeneous grids now!

Interoperability and Standards and Openness Are Critical

Current Grid Status

Grid Hardware

Middleware

Scientific Applications

InterfacesInterfaces

User Community

Chemistry and Computational Biology

User BaseSep 03 – Oct 04

NRAC AAB Small Allocations

-------------------------------------------------------------

#PIs 26 23 64

#SUs 5,953,100 1,374,100 640,000

User Issues• New systems meant learning new commands• Porting Codes• Learning new job submissions and

monitoring protocols• New proposals for time• Computational modeling became more

popular and users increased • Batch queues are longer / waiting increased• Find resources where to compute - probably

multiple distributed sites• Multiple proposals/allocations/logins• Authentication and Data Security • Data management

Computational Chemistry Grid

Integrated Cyber Infrastructure for Computational Chemistry

Integrates Applications, Middleware, HPC

resources, Scheduling and Data

management

Allocations, User Services and Training

Resources

System (Site) Procs Avail

Total CPU Hours/Year

Status

Intel Cluster (OSC) 36 315,000 SMP and Cluster nodes

HP Integrity Superdome (UKy)

33 290,000 TB Replaced with an SMP/ Cluster nodes

IA32 Linux Cluster (NCSA)

64 560,000

Intel Cluster (LSU) 1024 1,000,000

IBM Power4 (TACC) 16 140,000

Teragrid (Multiple Institutions)

250,000 New Allocation Expected

Other Resources

Extant HPC resources at various

Supercomputer Centers (Interoperable)

Optionally Other Grids and Hubs/local/personal

resources

These may require existing allocations/Authorization

Grid Middleware Proxy Server

GridChem System

user user useruser user

PPortal Clientortal Client

Grid ServicesGrid Services

GridGrid

applicationapplicationapplicationapplication

Mass Storage

http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0438312

Applications

• GridChem supports some apps already– Gaussian 98/03, GAMESS, NWChem, Molpro, QMCPack,

Amber

• Schedule of integration of additional software– ACES-2– Crystal– Q-Chem– Wein2K– MCCCS Towhee – More …..

Gridchem Middleware

Web Services Oriented

WS

XML is used to tag the data, SOAP is used to transfer the data, WSDL is used for describing the services available and UDDI is used for listing what services are available.

Web Services is different from Web Page Systems or Web Servers:There is no GUIWeb Services Share business logic, data & processes through API with each other (not with user)Web Services describe Standard way of interacting with “web based” applications

A client program connecting to a web service can read the WSDL to determine what functions are available on the server. Any special datatypes used are embedded in the WSDL file in the form of XML Schema. Universal Description, Discovery, and Integration. WSRF Standards Compliant.

Client Objects Database Interaction

WSResources

DTOClient

Objects Hibernate

Databasehb.xml

DTO (Data Transfer Object)Serialize transfer through XML

DAO (Data Access Object) How to get the DB objectshb.xml (Hibernate Data Map)

describes obj/column data mapping

BusinessModel

DAO

Database Table Relationships

Users Projects Resources

UserProjectResource

SoftwareResources

ComputeResources

NetworkResoruces

StorageResources

Resources

resoruceIDTypehostNameIPAddresssiteID

userIDprojectIDresourceIDloginNameSUsLocalUserUsed

JobsjobIDjobNameuserIDprojIDsoftIDcost

Users Resources

Computational Chemistry Resource

GMS_WS Use Cases

• Authentication

• Job Submission

• Resource Monitoring

• File Retrieval

http://www.gridchem.org:8668/space/GMS/usecase

GMS_WS Authentication

• WSDL (Web Service Definition Language) is a language for describing how to interface with XML-based services. It describes network services as a pair of endpoints operating on messages with either document-oriented or procedure-oriented information.

• The service interface is called the port type • WSDL FILE: <?xml version="1.0" encoding="UTF-8"?> <definitions name="MathService"

targetNamespace="http://www.globus.org/namespaces/examples/core/MathService_instance" xmlns="http://schemas.xmlsoap.org/wsdl/" …

http://www.gridchem.org:8668/space/GMS/usecase

Contact GMSCreates Session, Session RP and EPRSends EPR

Login Request(username:passwd)

Validates, Loads UserProjectsSends acknowledgement

Retrieve UserProjects(GetResourceProperty port Type PT)

GC Client GMS

GMS_WS Authenticationhttp://www.gridchem.org:8668/space/GMS/usecase

Selects projectLoadVO port type(w. MAC address)

Verifies user/project/MACaddrLoad UserResources RP

Retrieve UserResources[as userVO/ Profile](GetResourceProperty port Type PT)

GC Client GMS

Validates, Loads UserProjectsSends acknowledgement

Sends acknowledgement

GMS_WS Job Submission

Create Job objectPredictJobStartTime PT + JobDTO

JobStart Prediction RP

PT = portType RP = Resource PropertiesDTO = Data Transfer Object

Completion:Email from batch systemto GMS servercron@GMS DB

SubmissionCoGKitGAT“gsi-ssh”

If decision OK,SubmitJob PT + JobDTO

Create Job objectAPI—SubmitStore Job Object

Send Acknowledgement

Need to check to make sure allocation-time is available.

GC Client GMS

GMS_WS Monitoring

Parse XML,Display

PT = portType RP = Resource PropertiesDTO = Data Transfer ObjectDB = Data Base

cron@GMS servercron@HPC ServersJob Launcher NotificationsVO Admin emailparses email DB(status + cost)

Request for Job,Resource StatusAlloc. Balance

UserResource RP Updated from DB

GC Client GMS Resources/Kits/DB

Send info

GMS_WS File Retrieval

GetResourceProperty PTFileDTO(?)LoadFile PT(project folder+job)

Validates projectfolder owned by user.Send new listing

PT = portType RP = Resource PropertiesDTO = Data Transfer ObjectMSS = Mass Storage System

Job Completion:Send Output to MSS

LoadFile PT MSS queryUserFiles RP +FileDTO object

Retrieve Root Dir. Listing on MSS withCoGKit orGAT or“gsi-ssh”

Should whole directory be evaluated (may be large)—why not just those owned by user?

API file requestStore locallyCreate FileDTOLoad into UserData RP

RetrieveFiles PT(+file rel.path)

Retrieve file:CoGKit orGAT or“gsi-ssh”

GetResourceProperty PT

GC Client GMS Resources/Kits/DB

GMS_WS File Retrieval

PT = portType RP = Resource PropertiesDTO = Data Transfer ObjectMSS = Mass Storage System

Create FileDTO (?)Load into UserData RP

Should whole directory be evaluated (may be large)—why not just those owned by user?

RetrieveJobOutput PT(+JobDTO)

Job Record fromDB.Running: from ResourceComplete: from MSS

Retrieve file:CoGKit orGAT or“gsiftp”

GetResourceProperty PT

GC Client GMS Resources/Kits/DB

Web Services

WSRF (Web Services Resource Framework) Compliant WSRF Specifications:WS-ResourceProperties (WSRF-RP)

WS-ResourceLifetime (WSRF-RL) WS-ServiceGroup (WSRF-SG) WS-BaseFaults (WSRF-BF)

%ps -aux | grep ws/usr/java/jdk1.5.0_05/bin/java \-Dlog4j.configuration=container-log4j.properties \-DGLOBUS_LOCATION=/usr/local/globus \-Djava.endorsed.dirs=/usr/local/globus/endorsed \-DGLOBUS_HOSTNAME=derrick.tacc.utexas.edu \-DGLOBUS_TCP_PORT_RANGE=62500,64500 \-Djava.security.egd=/dev/urandom \-classpath /usr/local/globus/lib/bootstrap.jar: /usr/local/globus/lib/cog-url.jar: /usr/local/globus/lib/axis-url.jar org.globus.bootstrap.Bootstrap org.globus.wsrf.container.ServiceContainer -nosec

Logging ConfigurationWhere to find Globus

Where to get random seedfor encryption key generation

Classpath (required jars)

Software Organization

• CVS for GridChem

• Package:org.gridchem.service.gms

GMS_WS

GMS_WS

+

Should these each be a separate package?

model

dto

credential

job

notification

file file.taskjob.task

user

exceptions

resource

persistence

synchquery

test

util

dao

gpir

cryptenumeratorsgatproxy

GMS_WS

client

audit

gms Classes for WSRF service implementation (PT)Cmd line tests to mimic client requestsData Access Obj – queries DB via persistent classes (hibernate)Data Transfer Obj – (job,File,Hardware,Software,User) XMLHow to handle errors (exceptions)CCG Service business mode (how to interact)Contains user’s credentials 4 job sub. file browsing,…“Oversees correct” handling of user data (get/putfile).Define Job & util & enumerations (SubmitTask, KillTask,…)

CCGResource&Util, Synched by GPIR, abstract classesNetworkRes., ComputeRes., SoftwareRes., StorageRes., VisualizationRes.

User (has attributes – Preference/Address)DB operations (CRUD), OR Maps, pool mgmt,DB session,Classes that communicate with other web services

Periodically update DB with GPIR info (GPIR calls)JUnit service test (gms.properties): authen. VO retrieval, Res.Query,Synch, Job Mgmt, File Mgmt, NotificationContains utility and singleton classes for the service.Encryption of login passwordMapping from GMS_WS enumeration classes DBGAT util classes: GATContext & GAT Preferences generationClasses deal with CoGKit configuration.

Autonomous notification via email, IM, textmesg.

GMS_WS external jars

• Testing

• For XML Parsing

• “Java” Document Object Model – Lightweight– Reading/Writing XML Docs– Complements SAX (parser) & DOM– Uses Collections**

Authentication

Resource Status

Job Editor

Job Submission

Job Monitoring

Gradient Monitoring

Energy Monitoring

Post Processing

Visualization

Molecular Visualization

Electronic Properties

Spectra

Vibrational Modes

Molecular Visualization

Better molecule representations(Ball and Stick/VDW/MS)

In Nanocad Molecular Editor Third party visualizer integration Chime/VMD

Export Possibilities to others interfaces Deliver standard file formats

(XML,SDF,MSF,Smiles etc…)

Eigen Function Visualization

• Molecular Orbital/Fragment Orbital

• MO Density Visualization

• MO Density Properties

• Other functions

Radial distribution functions

Some example VisualsArginine Gamess/6-31G*Total electronic density

2D - Slices

Electron Density in 3DInteractive (VRML)

Orbital 2D DisplaysN2 6-31g* Gamess

Orbital 3DVRML

Spectra

• IR/Raman Vibrotational Spectra

• UV Visible Spectra

• Spectra to Normal Modes

• Spectra to Orbitals

GridChem Use

• Allocation

Community and External Registration

• Consulting/User Services

Ticket tracking, Allocation Management

• Documentation Training and Outreach

FAQ Extraction, Tutorials, Dissemination

Users and Usage

• 170 Users

Include Academic PIs, two graduate classes

And about 15 training users• NCSA 57000 SUs + A 7 node dedicated system• UKy around 106766 SUs• OSC 13,820 SUs + A 14 node dedicated system• Usage at LSU and TACC as well

More than a 335000 CPU Wallhours since Jan 06.

Science Enabled

• Chemical Reactivity of the Biradicaloid (HO...ONO) Singlet States of Peroxynitrous Acid. The Oxidation of Hydrocarbons, Sulfides, and Selenides. Bach, R. D.; Dmitrenko, O.; Estévez, C. M. J. Am. Chem. Soc. 2005, 127, 3140-3155.

• The "Somersault" Mechanism for the P-450 Hydroxylation of Hydrocarbons. The Intervention of Transient Inverted Metastable Hydroperoxides. Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006, 128(5), 1474-1488.

• The Effect of Carbonyl Substitution on the Strain Energy of Small Ring Compounds and their Six-member Ring Reference Compounds Bach, R. D.; Dmitrenko, O. J. Am. Chem. Soc. 2006,128(14), 4598.

Science Enabled

• Azide Reactions for Controlling Clean Silicon Surface Chemistry:Benzylazide on Si(100)-2 1Semyon Bocharov, Olga Dmitrenko, Lucila P. Mendez De Leo, and Andrew V. Teplyakov*Department of Chemistry and Biochemistry, UniVersity of Delaware, Newark, Delaware 19716Received April 13, 2006; E-mail: [email protected]

http://pubs.acs.org.proxy2.library.uiuc.edu/cgi-bin/asap.cgi/jacsat/asap/pdf/ja0623663.pdf [May  require ACS access]

Third Year Plans

• Post Processing

• New Application Support

• Expansion of Resources

• Extension Plan

Acknowledgments

• Rion Dooley, TACC Middleware Infrastructure

• Stelios Kyriacou, OSC Middleware Scripts

• Chona Guiang, TACC Databases and Applications

• Kent Milfeld, TACC Database Integration • Kailash Kotwani, NCSA, Applications and Middleware

• Scott Brozell, OSC, Applications and Testing

• Michael Sheetz, UKy, Application Interfaces

• Vikram Gazula, UKy, Server Administration

• Tom Roney, NCSA, Server and Database Maintaienance