8
Multiple Ligand Trajectory Docking Study — Semiautomatic Analysis of Molecular Dynamics Simulations using EGEE gLite Services A. Kˇ renek, J. Kmuníˇ cek, J. Filipoviˇ c, Z. Šustr, F. Dvoˇ rák, J. Sitera and L. Matyska CESNET, z. s. p. o. Zikova 4, Prague, Czech Republic [email protected] Martin Petˇ rek and Jiˇ rí Wiesner NCBR Kamenice 5, Brno, Czech Republic [email protected] Abstract Interactions between large biomolecules and smaller bio-active ligands are usually studied through a process called docking. Its aim is to find an energetically favorable orientation of a lig- and within an active site of a biomolecule. Chem- ical reactions take place in active site and the role of the ligand is either to speed up, slow down or change the reaction (e.g., an enzyme catalyzed hy- drolysis), which is why it can have huge pharma- ceutical or other commercial impact. We present a tool that supports effective man- agement and control of a typical workflow of dock- ing parametric study. Selected subsets of lig- ands and protein trajectory snapshots can be dis- played in three different views and further ana- lyzed. Finally, the application supports spawning and steering underlying computations running on the Grid. 1 Studied Problem The docking search for biomolecular complex structure is done on snapshots taken from the molecular dynamics trajectory describing the dy- namic behavior of the biomolecule. Each snapshot is a specific structure the biomolecule has at a spe- cific time (a frozen structure). For each snapshot, we calculate the best position of the ligand (i.e., the orientation where whole system does have the lowest energy). This is repeated for each ligand we investigate and, in the end, we receive a ma- trix containing energies of snapshot/ligand inter- actions. The lowest energy shown in this matrix is the primary result of such a study, but the whole matrix is of importance as it provides insight into the dynamics of the interaction. The creation of such a matrix is, from the com- puting standpoint, a very demanding task. Further- more, the researchers need assistance managing its complexity (to be sure the whole matrix is com- puted and no element forgotten). A sophisticated job submission system coupled with an archive (a provenance) of jobs already run is a necessary pre- requisite for such studies. The acetylcholinesterase [13] (AChE, Fig. 1) enzyme plays a key role in nerve signal transmis- sion and its inhibition can lead to a very fast death of an organism. Therefore, its inhibition is targeted by many nerve paralytic substances (sarin, tabun, etc.). There is a reactivation process whereby the catalytical potential of the inhibited enzyme can be restored. However, there is no known univer- sal reactivator able to reactivate AChE enzyme in- hibited by at least a majority of commonly used nerve agents. Therefore we attempt to study the structural and energetical aspects of the reactiva- tion process by means of computational chemistry to find a suitable generic reactivator able to liberate AChE poisoned by nerve paralytic compounds. We have investigated the interaction between acetylcholinesterase participating in nerve signal 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing 0-7695-3089-3/08 $25.00 © 2008 IEEE DOI 10.1109/PDP.2008.71 447 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing 0-7695-3089-3/08 $25.00 © 2008 IEEE DOI 10.1109/PDP.2008.71 447

[IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

  • Upload
    ludek

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

Multiple Ligand Trajectory Docking Study —Semiautomatic Analysis of Molecular Dynamics Simulations using

EGEE gLite Services

A. Krenek, J. Kmunícek, J. Filipovic, Z. Šustr, F. Dvorák, J. Sitera and L. MatyskaCESNET, z. s. p. o.

Zikova 4, Prague, Czech [email protected]

Martin Petrek and Jirí WiesnerNCBR

Kamenice 5, Brno, Czech [email protected]

Abstract

Interactions between large biomolecules andsmaller bio-active ligands are usually studiedthrough a process called docking. Its aim is tofind an energetically favorable orientation of a lig-and within an active site of a biomolecule. Chem-ical reactions take place in active site and the roleof the ligand is either to speed up, slow down orchange the reaction (e.g., an enzyme catalyzed hy-drolysis), which is why it can have huge pharma-ceutical or other commercial impact.

We present a tool that supports effective man-agement and control of a typical workflow of dock-ing parametric study. Selected subsets of lig-ands and protein trajectory snapshots can be dis-played in three different views and further ana-lyzed. Finally, the application supports spawningand steering underlying computations running onthe Grid.

1 Studied Problem

The docking search for biomolecular complexstructure is done on snapshots taken from themolecular dynamics trajectory describing the dy-namic behavior of the biomolecule. Each snapshotis a specific structure the biomolecule has at a spe-cific time (a frozen structure). For each snapshot,we calculate the best position of the ligand (i.e.,the orientation where whole system does have the

lowest energy). This is repeated for each ligandwe investigate and, in the end, we receive a ma-trix containing energies of snapshot/ligand inter-actions. The lowest energy shown in this matrix isthe primary result of such a study, but the wholematrix is of importance as it provides insight intothe dynamics of the interaction.

The creation of such a matrix is, from the com-puting standpoint, a very demanding task. Further-more, the researchers need assistance managing itscomplexity (to be sure the whole matrix is com-puted and no element forgotten). A sophisticatedjob submission system coupled with an archive (aprovenance) of jobs already run is a necessary pre-requisite for such studies.

The acetylcholinesterase [13] (AChE, Fig. 1)enzyme plays a key role in nerve signal transmis-sion and its inhibition can lead to a very fast deathof an organism. Therefore, its inhibition is targetedby many nerve paralytic substances (sarin, tabun,etc.). There is a reactivation process whereby thecatalytical potential of the inhibited enzyme canbe restored. However, there is no known univer-sal reactivator able to reactivate AChE enzyme in-hibited by at least a majority of commonly usednerve agents. Therefore we attempt to study thestructural and energetical aspects of the reactiva-tion process by means of computational chemistryto find a suitable generic reactivator able to liberateAChE poisoned by nerve paralytic compounds.

We have investigated the interaction betweenacetylcholinesterase participating in nerve signal

16th Euromicro Conference on Parallel, Distributed and Network-Based Processing

0-7695-3089-3/08 $25.00 © 2008 IEEEDOI 10.1109/PDP.2008.71

447

16th Euromicro Conference on Parallel, Distributed and Network-Based Processing

0-7695-3089-3/08 $25.00 © 2008 IEEEDOI 10.1109/PDP.2008.71

447

Page 2: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

Figure 1. Three dimensional view of

the secondary structure elements of

human acetylcholinesterase as ob-

tained from RCSB Protein database

(code 1B41, colored by chains).

transmission, and a set of organic aromatic com-pounds that could serve as reactivators. The ac-tual problem shown here deals with a 2-ns acetyl-cholinesterase trajectory and 3 ligands, requiringsome 6000 CPU hours on an average computingserver. Realistic studies use more and longer tra-jectories (tens of ns) as well as a higher number ofpotential ligands (tens or hundreds). Such a com-putation is unmanageable without semi-automaticsupport tools.

The molecular dynamics trajectory of theacetylcholinesterase was calculated using molec-ular dynamics programs from the AMBER pack-age, the docking procedure itself was carried outby DOCK1. VMD [5] was used to allow visual in-spection of the resulting biomolecular complexesusing our in-house visualization plugin.

2 Analysis Automation

The whole docking study is run from a customgraphical desktop application (or workbench, ordashboard . . . ) shown in Fig. 2. It was designedunder tight cooperation with NCBR2 researchersto suit their typical workflow for day-to-day ana-

1http://dock.compbio.ucsf.edu/2National Centre for Biomolecular Reserch, http://

ncbr.chemi.muni.cz

lytical work, and — gradually — to take care of allcommon tasks performed, up to now, manually.

As the first step, the application supports the se-lection of the working domain for the current ses-sion, i. e. subsets of both trajectory snapshots andspecific ligands. Then it queries the underlyingGrid services and local job repository (Sect. 3.4.2),and displays a 2D array of Grid jobs (includingprepared, running, and finished jobs) matching thecriteria. Three viewing modes (left pane in Fig. 2)of the array are provided:

• Middleware oriented: shows the number andstatus of Grid jobs falling to each array cell(green — finished successfully, red — failed,yellow — being processed).

• Application metrics: a significant numericvalue (binding energy in this case) computedby each of the jobs is mapped to a color-scalerepresentation, giving an immediate insight inits distribution over the working domain.

• User rank: similar to the previous one but as-signed manually by users as a result of theirexpert assessment of the outcome of the com-putations.

The use of Grid services, rather than local andprivate user job repository, adds an important fea-ture — sharing the results and assessments amongall users of a collaborating team. Jobs (and theirresults) submitted by one user are immediately vis-ible to all members of the team (obviously respect-ing security restrictions; the users have to set ap-propriate permissions). In this way, results areshared easily and duplicate computations can beavoided.

A typical user session starts with selecting thework domain (as described above). Then, resultsof finished jobs can be examined in detail, includ-ing three dimensional visualization of emergingcomplex 3D structures. Batches of jobs can beprepared in order to fill empty cells of the array tocomplete overall docking analysis. Also, existingjobs can be used as templates, cloned, and re-runwith modified input parameters.

Finally, the user can mark each job with a nu-meric rank (which is visualised with color scalein the “user rank” view described above) as wellas a free-form text annotation recorded and visibleto the others. For example, a job which achievesgood (low) binding energy (the application met-rics) can be annotated “good energy but due toapparently faulty computation” and assigned bad

448448

Page 3: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

Figure 2. Basic layout of the graphical application. The left pane is a view on the

snapshot vs. ligand array (both “middleware” and “application” views are shown),

the top-right pane shows individual jobs falling into the selected array cell, and the

bottom-right pane displays details of the selected job.

user ranking in order to exclude it from furtherconsiderations.

The description confirms that the application isdesigned specifically for solving this class of sci-entific problems. This was done intentionaly. Inthis case we do not believe in the “one size fitsall” paradigm; requirements of different user com-munities can be rather diverse, and trying to de-sign a generic application would yield an over-complex, cumbersome implementation. On theother hand, with the use of the simpler genericGrid services in the background, the applicationis exteremely lightweight. After the design wasagreed upon, the actual coding required only ap-prox. two person-weeks of a skilled programmer.

3 Underlying Grid Services

3.1 Charon Extension Layer —Managing Job Submission

The Charon Extension Layer toolkit [7, 8] is auniversal framework creating a layer on top of thebasic Grid middleware environment and makingthe access to the complex Grid infrastructure mucheasier compared to the native middleware. It pro-vides a command-line oriented interface and is in-

tended for users who require full control over theirrunning computational jobs. CEL provides uni-form and modular approach to complex computa-tional job submission and management, and formsa generic system for the use of application pro-grams in the Grid environment independently ofGrid middleware present at the specific fabric in-frastructure. CEL can be easily used for powerfulapplication management enabling single/parallelexecution of computational jobs without job scriptmodification. Simultaneously, standard job man-agement involving easy job submission, monitor-ing, and result retrieval can be performed withoutany additional hassle or requirements put on users.

The complete Charon Extension Layer com-bines two distinct subsystems — the Module Sys-tem and the Charon System. The Module Sys-tem is used to manage available application port-folio. It solves problems related to the execution ofapplications on machines with different hardwareand/or operating systems, and it is also capable ofsimplifying the execution of applications in paral-lel environments. The Charon System is a specificapplication managed by the Module System thatintroduces a complete solution for job submissionand subsequent management.

449449

Page 4: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

Figure 3. Data flow into gLite Job Provenance

3.2 gLite Job Processing

The only way the user can access computationalresources in gLite middleware [1] is through a job.Despite not completely restricted to, gLite is de-signed to support a large number of traditionalbatch, non-interactive jobs.

Upon creation, the job is assigned a unique im-mutable job identifier (jobid). The jobid is used torefer to the job not only during its active life butalso all the time afterwards.

The user describes the job (executable, param-eters, input files etc.) using the Job DescriptionLanguage (JDL) [11] based on the extensible Clas-sified Advertisement (ClassAd) [12] syntax. Thedescription may grow fairly complex and includeinformation on execution environment-related re-quirements, proximity of input and output storage,etc.

Job processing can be summarized as follows(denoting gLite components in italics):

• the job is submitted via the User Interface(simple command line tools)

• the Workload Manager (WM) [1] queues thejob and starts looking for a suitable Comput-ing Element (CE) to execute it

• the job is passed to the chosen CE and runsthere

• as a part of its processing, the job maydownload inputs from Data Management ser-vices [1] as well as upload its main resultsthere to be stored permanently

• after completion, the user can retrieve misce-laneous (volatile) job output directly

• all the time, the job is being tracked bythe Logging and Bookkeeping (L&B) ser-vice [10], which provides the user with infor-mation on the job state and further details ofjob processing

• after the user retrieves the job output, the mid-dleware data (namely the job trace in L&B)on the job are passed to Job Provenance(Sect. 3.3) and purged from their original lo-cations

• annotations can be added to the job during itsactive lifetime via L&B (even from the insideof running applications), or any time after-wards via JP.

3.3 Job Provenance — Archivingthe Data

The need for a Grid middleware service thatwould help users track their jobs, store the infor-mation for a long term, allow adding further an-notations, and, finally, provide efficient queryingcapabilities, was the primary motivation for devel-oping the Job Provenance (JP) Service.

Pragmatic implementational requirements onJP, given its main purpose, are rather contradic-tory. Information on each job should be suffi-ciently detailed in order to allow job re-execution,while the data gathered should be stored for a longtime. This implies ever growing storage space re-quirements that must be kept reasonable by mak-ing job records as compact as possible. The EGEEproject aims at 1 million of jobs per day; quanti-tative assessments of implications in JP, as well asdeployment considerations, are given in [10]. At

450450

Page 5: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

the same time, efficient queries are required, whichis virtually impossible with such a huge numberof compact records. Finally, JP has to be able tocope consistently with long-term evolution of var-ious data formats.

The overall JP design tries to keep these re-quirements in a reasonable balance. This sectionprovides an overview. Further details can be foundin [10, 4].

3.3.1 Data in JP and Their Organization

In JP, data are organized primarily on a per-job ba-sis, a concept following the L&B model. Everydata item stored in JP is associated with an actualGrid job. The following data are gathered from theGrid middleware:

• job inputs, directly required for job re-running: complete job description (the JDLrecord) as submitted to the WM system(WMS), and miscellaneous input files (gLiteWMS input sandbox) provided by the user(however, job input files from reliable remotestorage are not copied to JP; this would not befeasible in a large scale)

• job execution trace, documenting the job ex-ecution environment — complete L&B data,that is when and where the job was plannedand executed, how many times and for whatreasons was it resubmitted, etc. This also in-cludes the results of “measurements” takenon computing elements, for example versionsof installed software, environment settings,etc.

• job annotations — the JP service allows usersto add arbitrary annotations to jobs in theform of “name = value” pairs. These annota-tions can be recorded either during job execu-tion, or at any time afterwards. Besides pro-viding information on the job (for examplethat it is a production-phase job of a particularexperiment), these annotations may carry in-formation on relationships between jobs andother entities such as external datasets, form-ing the desired data provenance record.

Fig. 3 shows the data flow channels from Gridmiddleware components (gLite) into JP.

At the logical level, all data in JP are expressedby attributes. A job attribute has a unique nameand may have multiple values for a single job.The attribute name must be fully qualified with

Figure 4. Job Provenance compo-

nents

a namespace (its schema can be specified and en-forced) in order to ensure extensibility and unique-ness.

3.3.2 JP Components

JP provides two classes of services: a perma-nent Primary Storage (JPPS) accepts and storesjob data, while the possibly volatile and config-urable Index Servers (JPIS) provide an optimizedquerying and data-mining interface for the end-users (see Fig. 4). The relationship between JPPSand JPIS ranks as many-to-many— a single JPIScan query multiple JPPS’s and vice versa, a singleJPPS is ready to feed multiple JPISs.

Primary Storage A single instance of JPPS,shown in Fig. 4, is formed by a front-end expos-ing its operations via a web-service interface [3],and a back-end responsible for actual data storageand providing the bulk file transfer interface usingarbirtrary file-oriented transfer protocol(s). Boththe front- and back-end share a filesystem so thatthe file-type plugins linked into the front-end ac-cess their files via POSIX I/O.

Primary Storage covers the first set of require-ments specified for the Job Provenance — storingcompact job records, allowing users to append an-notations, and providing elementary access to thedata.

The current implementation uses the MySQLrelational database to store basic job metadata anda Globus gridftp server as the back-end.

Index Server The role of Index Servers (JPIS)is to process and re-arrange the data from Pri-mary Storage(s) into a form suitable for frequent

451451

Page 6: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

and complex user queries. A typical interaction isshown in Fig. 4, consisting of following steps:

1. The user queries one or more JPISs, receivinga list of IDs of jobs matching the query.

2. JPPS is directly queried for additional job at-tributes or URLs of stored files.

3. The required files are retrieved.

The querying language is intentionally re-stricted in order to allow efficient implementationof the query engine. The current format of thequery is a list of lists of conditions. A conditionis a comparison (less, greater, equal) of an attributevalue to a constant. Items of an inner list must referto the same attribute and they are logically OR-ed.Finally the inner lists are logically AND-ed. Ac-cording to our experience with the L&B service,this query language is powerful enough to satisfyuser needs while simple enough to allow efficientprocessing.

For example, to query all jobs named “dock”,executed with a “flexible” parameter, and ran onMonday or Tuesday:

(program="dock") AND(param="flexible") AND(day="Monday" OR day="Tuesday")

Index Servers are created, configured, and pop-ulated semi-dynamically according to the needs ofa particular user community. The configurationconsists of:

• one or more Primary Storages to contact,

• conditions (expressed in terms of JP at-tributes) on jobs that should be retrieved,

• list of attributes to be retrieved,

• list of attributes to be indexed — a user querymust refer to at least one of these for perfor-mance reasons.

The set of attributes and the conditions specify theset of data that is to be retrieved from JPPS, andthey reflect the assumed pattern of user queries.The amount of data fed into a single JPIS instanceis assumed to be only a fraction of data in JPPS,both regarding the number of jobs, and the num-ber of distinct attributes.

3.4 Application specific configura-tion and issues

3.4.1 Job Types

The analysis described in Sect. 2 requires runningcomputational jobs of three classes:

• Snapshot preparation. The job extractsa given snapshot from the MD trajectory,stores the data file in permanent storage, andrecords information on the snapshot (namelythe trajectory name, snapshot number, datafile location at the permanent storage, andseveral characteristic numbers such as its in-ternal energy) into JP.

• Ligand preparation. The job performs lig-and optimization, stores datafile in permanentstorage, and records information on the lig-and (its name, data file location, and severalcharacteristics) in JP.

• Docking. These jobs do the actual dockingcomputation. First, information on the job in-puts is recorded into JP in order to make thejob visible in the array immediately after sub-mission. Then, required input files are down-loaded, and the dock executable is invokedusing specified parameters. Finally the resultsof the docking computation are uploaded, andmetadata describing the result (data files lo-cation again, and characteristics of the solu-tions) are stored in JP.

Complete list of JP attributes attached to thesejob types can be found in [9].

The first two classes of jobs are expected tobe run once in a batch, as an initial step of thewhole analysis. They determine a possible work-ing domain. However, the domain can be extendedlater with adding more snapshots and/or ligands(by runnig more of these preparatory jobs).

On the contrary, the docking jobs are run rou-tinely during the analysis. Those are the jobs thatare submitted by the graphical front-end and man-aged in the local job repository.

3.4.2 Local Job Repository

Grid services exhibit certain intrinsic but ratheruser-unfriendly properties, namely asynchronousbehaviour, unexpected failures, and slow responseto operations. Therefore, it is desirable to shieldthe user from this behaviour with suitable wrap-ping of the services. On the other hand, it makes

452452

Page 7: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

little sense to provide such wrapping with an addi-tional service; the same problem would occur, justin a different location. On the contrary, we addressthe issue locally, with data stored and auxiliaryprograms running solely on the machine where theuser graphical front-end is run.

The complexity and eventual failures of jobsubmission and job output retrieval are handled bythe Charon system (Sect. 3.1). The system usesa per-job dedicated working directory where var-ious job metadata are stored. We extend this ap-proach by adding further metadata files that con-trol interaction with JP. Within this metadata stor-age, a job is handled in the following steps:

1. On job submission, the graphical user inter-face calls a library function that creates a ded-icated job directory, stores all job parametersthere, and creates initial metadata. The se-mantics of the library call is completely local,it returns immediately, not being affected byeventual unavailability of Grid services.

2. A local daemon starts checking the direc-tory periodically. The job is submitted (viaCharon).

3. After successful submission, job JP datawhich are known at that time (e. g. snapshotnumber and ligand name) are stored into JPto be available for queries immediately.

4. After successful job completition, its misce-laneous output is retrieved, and the job proto-col [9] is uploaded to JP.

Steps 3 and 4 are prototype implementations only.In the planned full JP integration, the job input datafor JP (step 3) will be included in the job descrip-tion and stored in JP automatically by the middle-ware services. Similarly, the job description willdenote the protocol file name and it will be up-loaded to JP as a part of the job clean-up proce-dure.

Besides handling the job processing, the localrepository serves as a backup backend for queriescalled by the graphical front-end. Therefore, theuser can see job-related data immediately after itssubmission through the GUI (even if the job hasnot been submitted to the Grid yet). This alsomeans that the operation of the GUI is not criticallyaffected by the unavailability of Grid services (in-formation on other users’ jobs is only not visiblein such case but the system is not blocked).

3.4.3 Service Configurations

Besides standard configuration of gLite services,the following specific items have to be addressed:

• L&B interface to JP: The L&B server usedfor the analysis must be configured to propa-gate job registrations and upload finished jobdata to JP Primary Storage. (This is supportedby standard configuration but not enabled bydefault).

• JP Primary Storage plugin: the jobs uploaddocking protocol to JP. It may be used to ex-tract many of the job attributes. A specificplugin3 performing this extraction must be in-stalled.

• JP index server configuration must supportall the queries of the graphical front-end. De-tails are given in [9].

3.5 Production Infrastructure

The complete set of application services to-gether with the graphical interface were devel-oped and implemented in the subset of EGEE gridinfrastructure — Virtual Organization for CentralEurope (VOCE). VOCE [6] is a dynamic, multi-institutional community established by resourceproviders within the Central Europe region (theCE Federation in the EGEE terms). It directlysupports CE researchers by providing the storageand computing services. VOCE uses the gLiteGrid middleware as provided by EGEE to supportdata sharing and computational resources withinthe CE. It also provides a platform on which otherGrid and application software can be installed andused to solve various types of computational ordata-intensive jobs.

Unlike majority of other virtual organizations,VOCE tends to be a generic virtual organization(VO) providing application-neutral environmentespecially suitable for Grid newcomers allowingthem to quickly acquire initial Grid computing ex-perience and to test and evaluate Grid environmenttowards their specific application needs. VOCEenvironment is also suitable for small groups forwhom creating and maintaining their own VOsmay represent too much overhead.

The primary goal of VOCE is to provide an en-vironment where new and “small” end user groups

3Source code is available at http://lindir.ics.muni.cz/dg_public/cvsweb.cgi/uf07/dock_plugin.

453453

Page 8: [IEEE 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008) - Toulouse, France (2008.02.13-2008.02.15)] 16th Euromicro Conference on Parallel,

can use a production level Grid, adapt existing ordevelop new applications, and prepare themselvesto eventually start their own VOs. Its secondarygoal is to provide an environment where new mid-dleware services can be introduced in a fast way,including services developed by the EGEE CEpartners independently of the main-stream gLitedevelopment.

4 Conclusion

Scientific experiments, even those carried ina pure computational environment, require veryprecise recording of the experiments, both oftheir setup and results. This is even more criti-cal for parametric studies, where similar experi-ments/computations are carried on a large multi-dimensional domain of possible inputs.

We present a semi-automated approach to man-age such records using available Grid services ofthe gLite middleware and the Charon ExtensionLayer toolkit. Besides keeping the records for anindividual user, the approach also strongly sup-ports collaboration in a user group. On the otherhand, the user is shielded from the complexity ofthe Grid where desirable.

To demonstrate the feasibility of these ideas, wehave developed a specialized graphical interfacefor solving generic biomolecular parametric jobs.It allows user application metrics evaluation basedon targeted parameters with potential extension forextensive biomedical screening. Following fea-tures are available as standard services: computa-tional jobs manipulation (input modification, jobsresubmission), targeted search and selection of de-sired jobs (finished, non-finished, aborted). More-over, the whole application is based of modularapproach allowing incorporation of application-specific plugins for the presentation of results (e.g.visualization).

Currently, the prototype is being used by usersat the National Centre for Biomolecular Research(NCBR), and it is being further extended accord-ing to their requests. We have also successfullydemonstrated its use during the EGEE User Forumconference [2].

This study demonstrates the usability of thegLite software stack to deal with complex com-putational studies in the computational chemistryarea with the potential for many others applicationdomains.

References

[1] gLite Middleware http://glite.web.cern.ch/glite/.

[2] Multiple Ligand Trajectory Docking Wiki Pages.http://egee.cesnet.cz/mediawiki/index.php/Job_Provenance_Demo.

[3] F. Dvorák and et al. EGEE JRA1 team, EGEEMiddleware Design—Release 1.

[4] F. Dvorák, D. Kouril, A. Krenek, L. Matyska,M. Mulac, J. Pospíšil, M. Ruda, Z. Salvet, J. Sit-era, and M. Vocu. gLite Job Provenance. In: Proc.IPAW’06, LNCS 4145, CERN EDMS #590869,2006, pages 246–253, 2006.

[5] W. Humphrey, A. Dalke, and K. Schulten. VMD– Visual Molecular Dynamics. Journal of Molec-ular Graphics, 14:33–38, 1996.

[6] J. Kmunícek and D. Kouril. Central EuropeanGrid Infrastructure for Generic Applications. In:Proceedings of 2nd Austrian Grid Symposium,Ossterreichische Computer Gesellschaft, pages131–142, 2007.

[7] J. Kmunícek, P. Kulhánek, and M. Petrek. CharonSystem – Framework for Applications and JobsManagement in Grid Environment. In: Proceed-ings of Cracow Grid Workshop 2005, AcademicComputer Center CYFRONET AGH. ISBN: 83-915141-5-3, pages 332–340, 2006.

[8] J. Kmunícek, M. Petrek, and P. Kulhánek. CharonExtension Layer – Universal Toolkit for Grid Ap-plications and Computational Jobs Maintenance.In: Proceedings of Cracow Grid Workshop 2006,Academic Computer Center CYFRONET AGH.ISBN: 83-915141-7-X, pages 353–359, 2007.

[9] A. Krenek, M. Petrek, J. Kmunícek, J. Filipovic,Z. Šustr, F. Dvorák, J. Sitera, J. Wiesner, andL. Matyska. Multiple Ligand Trajectory Dock-ing Study – Semiautomatic Analysis of Molecu-lar Dynamics Simulations using EGEE gLite Ser-vices, extended version. CESNET technical re-port, 2007.

[10] L. Matyska, A. Krenek, M. Ruda, J. Sitera,D. Kouril, M. Vocu, J. Pospíšil, M. Mulac, andZ. Salvet. Job Tracking on a Grid—the Loggingand Bookkeeping and Job Provenance Services.CESNET technical report, 2007.

[11] F. Pacini and et al. Job Description LanguageAttributes Specification.https://edms.cern.ch/file/590869/1/EGEE-JRA1-TEC-590869-JDL-Attributes-v0-8.pdf, 2007.

[12] R. Raman, M. Livny, and M. Solomon. Match-making: Distributed Resource Management forHigh Throughput Computing. In: Proc. HPDC,1998.

[13] J. Wiesner, Z. Kríž, K. Kuca, D. Jun, andJ. Koca. Computer Modeling and Simulation– New Technologies in Development of Meansagainst Combat Chemical Substances. Voj. zdrav.listy, 46(2):1144–1145, 2005.

454454