Globus Toolkit® v3.0 Alpha Tutorial
Resource Management
The Globus Project™Argonne National Laboratory
USC Information Sciences Institute
http://www.globus.org/
Copyright (c) 2002 University of Chicago and The University of Southern California. All Rights Reserved. This presentation is licensed for use under the terms of the Globus Toolkit Public License.
See http://www.globus.org/toolkit/download/license.html for the full text of this license.
GlobusWORLD 2003 GT3 Tutorial - Resource Management 2
Resource Management Overview
Resource Specification Language (RSL-2) is used to communicate requirementsA set of WSDL/OGSI client interfaces allows programs to be started on remote resources, despite local heterogeneity
GlobusWORLD 2003 GT3 Tutorial - Resource Management 3
GRAM Components
Site boundary
Master
Process
Process
Process
Local Resource Manager
Master Host Env
Redirector
RIPS
User Host Env
User Host Env
FSSFSS
FSFSFSFS
MJS
MJFS
MJS
FSFSFSFS
FSSFSSFSS
Grid Service RegistryClient
GlobusWORLD 2003 GT3 Tutorial - Resource Management 4
Resource Management WSDL’s
GlobusWORLD 2003 GT3 Tutorial - Resource Management 5
Resource Specification Language
Much of the power of GRAM is in the RSL
XML schema defined language for specifying job requests– Managed Job Service translates this common language
into scheduler specific language
GRAM service understands a well defined set of elements– executable, arguments, directory, …
GlobusWORLD 2003 GT3 Tutorial - Resource Management 6
RSL-2 Schema
Use standard XML parsing tools to parse and validate an RSL specification– xmlns:gram="http://gram.base.ogsa.globu
s.org/rsl/gram"
– Functions to process the DOM representation of RSL specification (for RSL substitutions)
Can be used to assist in writing brokers or filters which refine an RSL specification
GlobusWORLD 2003 GT3 Tutorial - Resource Management 7
Adding GRAM RSL Elements
Add additional elements to the GRAM RSL schemaElements and values will get propagated to the managed job scheduler Perl modules– Currently only limited use
GlobusWORLD 2003 GT3 Tutorial - Resource Management 8
RSL Elements For GRAM
<gram:executable> (type = rsl:pathType)– Program to run– A file path (absolute or relative) or URL
<directory> (type = rsl:pathType)– Directory in which to run (default is HOME)
<arguments> (type = rsl:argumentListType)– List of string arguments to program
<environment> (type = gram:env_type)– List of environment variable name/value pairs
GlobusWORLD 2003 GT3 Tutorial - Resource Management 9
RSL Attributes For GRAM
<stdin> (type = rsl:pathType)– Stdin for program– A file path (absolute or relative) or URL– If remote, entire file is pre-staged before execution
<stdout> (type = rsl:pathListType)– stdout for program– Multiple file paths (absolute or relative) or URL’s– If remote, file is incrementally transferred
<stderr> (type = rsl:pathListType)– stderr for program– Multiple file paths (absolute or relative) or URL’s– If remote, file is incrementally transferred
GlobusWORLD 2003 GT3 Tutorial - Resource Management 10
RSL Attributes For GRAM
<count> (type = rsl:integerType)– Number of processes to run (default is 1)
<hostCount> (type = rsl:integerType)– On SMP multi-computers, number of nodes to
distribute the “count” processes across
– count/hostCount = number of processes per host
<project> (type = rsl:stringType)– Project (account) against which to charge
<queue> (type = rsl:stringType)– Queue into which to submit job
– Queue properties reflected in the MDS resource description
GlobusWORLD 2003 GT3 Tutorial - Resource Management 11
RSL Attributes For GRAM
<maxWallTime> (type = rsl:longType)– Maximum wall clock runtime in minutes
<maxCpuTime> (type = rsl:longType)– Maximum CPU runtime in minutes
<maxTime> (type = rsl:longType)– Only applies if above are not used
– Maximum wall clock or cpu runtime (schedulers’s choice) in minutes
> CPU runtime makes sense on a time shared machine
> Wall clock runtime makes sense on a space shared machine
GlobusWORLD 2003 GT3 Tutorial - Resource Management 12
RSL Attributes For GRAM
<maxMemory> (type = rsl:integerType)– Maximum amount of memory for each
process in megabytes
<minMemory> (type = rsl:integerType)– Minimum amount of memory for each
process in megabytes
GlobusWORLD 2003 GT3 Tutorial - Resource Management 13
RSL Attributes For GRAM
<jobType> (type = rsl:jobRunType)– Value is one of “mpi”, “single”, “multiple”, or
“condor”> mpi: Run the program using “mpirun -np <count>”
> single: Only run a single instance of the program, and let the program start the other count-1 processes/threads
Good for scripts, and for multi-threaded programs
> multiple: Start <count> instances of the program using the appropriate scheduler mechanism
myjob can be used to coordinate these processes
> condor: Start a <count> Condor processes running in “standard universe” (I.e. linked with Condor libraries for remote I/O, checkpoint/restart, etc.)
GlobusWORLD 2003 GT3 Tutorial - Resource Management 14
RSL Attributes for GRAM
<scratchDir> (type = rsl:pathType)– A unique subdir under <path> is created for job– If path is relative, it is relative to:
> First - A site configured scratch directory> Second – Users HOME directory on JM host
– The job may use SCRATCH_DIRECTORY in RSL substitutions
<gassCache> (type = rsl:pathType)– Overrides the default GASS cache directory– Default is site configurable, or ~/.globus/.gasscache
if not configured
<libraryPath> (type = rsl:pathListType)– Set job environment so apps built to use shared
libraries will run properly
GlobusWORLD 2003 GT3 Tutorial - Resource Management 15
RSL Attributes for GRAM
<fileStageIn> (type = gram:fileStageInType)– List of remote url to local file pairs to be staged to
host where job will run
<fileStageInShared> (type=gram:fileStageInType) – List files to be staged to the GASS cache
– Links from cache to local file will be made
<fileStageOut> (type = gram:fileStageOutType)– List files to be staged out after job completes
<fileCleanUp> (type = rsl:pathListType)– List files to be removed after job completes
Hint: Use RSL substitution SCRATCH_DIRECTORY
GlobusWORLD 2003 GT3 Tutorial - Resource Management 16
RSL Attributes for GRAM
gramMyjob– Value is one of “collective”, “independent”
– Defines how the globus_gram_myjob library will operate on the <count> processes
> collective: Treat all <count> processes as part of a single job
> independent: Treat each of the <count> processes as an independent uniprocessor job
dryRun=true– Do not actually run job
GlobusWORLD 2003 GT3 Tutorial - Resource Management 17
RSL Attributes for GRAM
saveState = yes/no– Always saves state– Causes the jobmanager to save job
state/information to a persistent file on disk– Allow recovery from a jobmanager crash
twoPhase– Implemented in Managed Job port type
> Allows reliable job submission> Allow client to reliably determine completion vs failure
of a job
GlobusWORLD 2003 GT3 Tutorial - Resource Management 18
RSL Attributes for GRAM
restart = old jm contact– Automatically recovers/restarts (soon)
(stdoutPosition=<int> <int>)
(stderrPosition=…)– Implemented in File Stream port type
GlobusWORLD 2003 GT3 Tutorial - Resource Management 19
RSL Substitutions
RSL supports variable substitutions– Definition example
> <rsl:substitutionDef name=“MY HOME">/home/user1</rsl:substitutionDef>
– Reference example> <gram:executable>
<rsl:substitutionRef name=“MY HOME“/><rsl:pathElement path="/a.out"/>
</gram:executable>
Allows for late binding of values– Can refer to something that is not yet defined
GlobusWORLD 2003 GT3 Tutorial - Resource Management 20
GRAM DefinedRSL Substitutions
GRAM defines a set of RSL substitutions before processing the job request– Client submitted RSL can assume these
substitutions are defined and refer to them
Allows for generic RSL expressions to adapt to site and resource configurations– Goal: Clients should not have to do manual
configuration of resources before they submit jobs to them
– GRAM defined RSL substitutions define minimal information necessary to bootstrap
GlobusWORLD 2003 GT3 Tutorial - Resource Management 21
GRAM Defined RSL Substitutions
Machine Information– GLOBUS_HOST_MANUFACTURER
– GLOBUS_HOST_CPUTYPE
– GLOBUS_HOST_OSNAME
– GLOBUS_HOST_OSVERSION
GlobusWORLD 2003 GT3 Tutorial - Resource Management 22
GRAM DefinedRSL Substitutions
Paths to Globus– GLOBUS_LOCATION
Miscellaneous– HOME
– LOGNAME
– GLOBUS_ID
– SCRATCH_DIRECTORY
GlobusWORLD 2003 GT3 Tutorial - Resource Management 23
GRAM RSL Examples
<!--- GRAM RSL Namespace --->
<?xml version="1.0" encoding="UTF-8"?>
<rsl:rsl
xmlns:rsl="http://gram.base.ogsa.globus.org/rsl"
xmlns:gram="http://gram.base.ogsa.globus.org/rsl/gram"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation=
"http://gram.base.ogsa.globus.org/rsl/schema/base/gram/rsl.xsd
http://gram.base.ogsa.globus.org/rsl/gram/schema/base/gram/gram_rsl.xsd">
GlobusWORLD 2003 GT3 Tutorial - Resource Management 24
GRAM RSL Examples<rsl: rsl <!--- insert GRAM RSL Namespace --->
<gram:job><gram:executable>
<rsl:pathElement path="/bin/ls"/></gram:executable><gram:directory>
<rsl:pathElement path="/tmp"/></gram:directory><gram:arguments>
<gram:argument>-l</gram:argument><gram:argument>-a</gram:argument>
</gram:arguments></gram:job>
</rsl:rsl>
GlobusWORLD 2003 GT3 Tutorial - Resource Management 25
GRAM RSL Examples<rsl: rsl <!--- insert GRAM RSL Namespace --->
<rsl:substitutionDef name=“APP">cool_app</rsl:substitutionDef>
<gram:job>
<gram:executable>
<rsl:substitutionRef name=“HOME“/>
<rsl:substitutionRef name=“APP“/>
</gram:executable>
</gram:job>
</rsl:rsl>
GlobusWORLD 2003 GT3 Tutorial - Resource Management 26
GRAM grid services
We know how to specify a job using RSLNow how do we submit and manage that job?– Managed Job Factory Service
> Defines an OGSI/WSDL interface for submitting, monitoring and controlling a job
> MJS uses the File Stream Factory Service to manage the job’s stdout and stderr file streaming
> MJS exposes the stdout and stderr File Stream Factory Grid Service Handles (GSH) in Service Data Element
– Used by GramClient
GlobusWORLD 2003 GT3 Tutorial - Resource Management 27
GramClient
A simple java command line client that takes 2 arguments% java GramClient <master URL> <RSL>
GramClient can be useful for simple scripting and debugging
We anticipate the community will contribute robust clients– E.g. Condor-G Grid Manager for GT2
GlobusWORLD 2003 GT3 Tutorial - Resource Management 28
Managed Job Factory port type
CreateService– Prepare a job for submission on a remote
resource
– Input:> RSL specifying the job to be run
– Output:> Grid Service Reference (GSR) to MJS
WSDL definition of the MJS instance
GlobusWORLD 2003 GT3 Tutorial - Resource Management 29
Managed Job Factory port type
Service Data Element– List of GSHs of MJS instances
GlobusWORLD 2003 GT3 Tutorial - Resource Management 30
Managed Job port type
Start– Start/submit job to the compute resource
– Input:> none
– Output:> Initial job state – typically, unsubmitted
GlobusWORLD 2003 GT3 Tutorial - Resource Management 31
Managed Job port type
On destroy, or soft state termination– The MJS will cleanup everything
> Cancel the job
> Destroy File Stream Factories/Services
> Cleanup directories/files
Scratch dir
Gass cache
GlobusWORLD 2003 GT3 Tutorial - Resource Management 32
Managed Job port type
Service Data Elements– Job status
> UNSUBMITTED, PENDING, ACTIVE, FAILED, DONE, SUSPENDED, STAGEIN, STAGEOUT
– GSH to File Stream Factory Service for job’s Stdout
– GSH to File Stream Factory Service for job’s Stderr
GlobusWORLD 2003 GT3 Tutorial - Resource Management 33
File Stream Factory port type
CreateService– Prepare to stream job’s stdout or stderr to a
destination URL
– Input:> Destination URL
– Output:> GSH
StartStreaming– Start the streaming to the destination URL
GlobusWORLD 2003 GT3 Tutorial - Resource Management 34
File Stream port type
Service Data Element– DestinationUrl
GlobusWORLD 2003 GT3 Tutorial - Resource Management 35
GT3 GRAM Client Interfaces
– Java stubs for MJS
– C-bindings API for MJS> More info in session 8
– C-bindings GT2-3 Translator API for MJS> Accepts a GT2 RSL and translates to GT3 RSL (XML)
> GT2 backwards compatibility
> Ease transition to GT3
– Java cog GT2-3 Translator API for MJS> GT2 backwards compatibility
– Python bindings will follow
GlobusWORLD 2003 GT3 Tutorial - Resource Management 36
Important Notice!!
Our goals are:– Highly functional interface
> grid service WSDLs
> C API
> Java API
– Expressive RSL
– Only basic command line clients
– Collaborate with others to create more capable and complete clients
> E.g. Condor-G grid manager
GlobusWORLD 2003 GT3 Tutorial - Resource Management 37
Higher level Resource Management Services
To date, no GT3 co-allocators (DUROC)– simultaneous allocation of a resource set– mpich-g2 is DUROC’s only user
GlobusWORLD 2003 GT3 Tutorial - Resource Management 38
MJS to Resource InterfaceResource Information Provider Service (RIPS)– a specialized notification service
– maintains job information from the scheduler
– Scheduler info provider is essentially the GT2 queue script used by the gram-reporter, but it outputs XML instead of LDIF
The MJS instances will subscribe to RIPS for notification on job state changes
GlobusWORLD 2003 GT3 Tutorial - Resource Management 39
MJS to Resource InterfaceInteractions with file system and scheduler are done by MJS calling scheduler perl module– Same GT2 scheduler perl modules are used in GT3
without modification!
– This allows the JM host to be different from the scheduler host
File system interactions– gass_cache, scratch_dir, file_staging, proxy
relocate/refresh
Scheduler interactions– Job: submit, cancel, poll
GlobusWORLD 2003 GT3 Tutorial - Resource Management 40
MJS Files
GASS_CACHE
stdout
stderr
stagedEXE
stagedstdin
UP
MJFSRSL
Exe=xArgs=yEnv=z
JOBMJS
Jobrestart
Master
Client
scratchdir
stagedfiles
UHE_OGSA
serverConfig.wsdd
clientConfig.wsdd
GlobusWORLD 2003 GT3 Tutorial - Resource Management 41
MJS to Resource InterfaceYour scheduler is not supported?– No problem. See www.globus.org/gram “JM scheduler
tutorial” step by step for writing an interface for an unsupported scheduler
– JM scheduler setup package> The scheduler interface is implemented as a Perl module which is a
subclass of the Globus::GRAM::JobManager module. Only submit, poll and cancel are required.
> Autoconf script to locate the scheduler commands (e.g. qsub) and substitute values in the scheduler perl module
> Scheduler Info provider for RIPS
– And consider contributing it back for inclusion in future releases of the Globus Toolkit
> Or add to Grid Technology Repository
GlobusWORLD 2003 GT3 Tutorial - Resource Management 42
Changes: GT2 3.0
New Grid Service interface– Master MJFS, MJFS, MJS, FSFS, FSS
RSL-2
RIPS
GlobusWORLD 2003 GT3 Tutorial - Resource Management 43
GRAM exercise
Use gramClient to submit a job
Documentation– http://www.globus.org/gram
> GT2 centric, but still good information for GRAM RSL element descriptions
> Better GT3 documentation will come
GlobusWORLD 2003 GT3 Tutorial - Resource Management 44
Reliable File Transfer ServicePerforms a third party transfer between two GridFTP servers reliably– Non-user based
> Delegate User Proxy on startTransfer
– Stores the transfer state in a database (PostGreSQL)
– Restarts the transfer from the last checkpoint> Checkpoint = GridFTP server restart marker
– Reliably recovers from crashes of service container, source host and destination host, temporary network outages and file system failures
GlobusWORLD 2003 GT3 Tutorial - Resource Management 45
RFT port type
SubmitTransferJob()– Intput message:fromURL and toURL
(strings),transferOptions> transferOptions: tcpBufferSize(int),parallelStreams(int),dcau
(boolean)
– Output message: transferJobID (integer)
getStatus()– Input message: transferJobID– Output message: status (integer)
cancelTransfer– Input message: transferJobID– Output message: N/A
GlobusWORLD 2003 GT3 Tutorial - Resource Management 46
RFT Service Data Elements
FileTransferRestartMarker (int)– Checkpoints given out by GridFTP servers
which represent how much of the file has already been transferred
FileTransferProgressType (int)– Performance markers given out by a GridFTP
server which can be used to get the performance measurements of a particular transfer