Co-allocation Using HARC IV. ResourceManagers

Preview:

DESCRIPTION

Co-allocation Using HARC IV. ResourceManagers. HARC Workshop University of Manchester. Philosophy. New types of RMs can be written by others Existing RMs can be customized Interfaces can be enhanced or changed None of this means changing the acceptor code API is extensible too - PowerPoint PPT Presentation

Citation preview

Co-allocation Using HARCIV. ResourceManagers

HARC WorkshopUniversity of Manchester

Philosophy• New types of RMs can be written by others• Existing RMs can be customized• Interfaces can be enhanced or changed• None of this means changing the acceptor code• API is extensible too

• Good community contribution model• CCT keeps control of the acceptor code• The acceptor code will become very stable

(already less than one commit per month)• The community evolves the system

Are RMs Easy to Install

• Harder than client software• Much easier than Acceptors

• Complexity is in the right place:– Only a few people install and configure Acceptors

(infrastructure), which is hard– Some people modify/write RMs, which is not too

hard– More people install and configure RMs which is

easy– Many people install and configure the Client

software, which is trivial

Pre-installation - Perl

• RMs are written in perl, to make installation trivial• However, they need a large number of CPAN

modules to be installed• Some of these, e.g. Net::SSLeay and

Crypt::SSLeay are not trivial

• There is a document which contains things to watch out for– Lists previously seen problems, with solutions– Basically a list of exceptions– Now 7 pages of text!– There’s a lot of AIX content...

Pre-installation - Certificate

• HARC RM needs a certificate• We don’t recommend re-using the host certificate• Get a service certificate

• UK e-Science CA now supports:– harccrm for Compute RMs (CRMs)

• /C=UK/O=eScience/OU=Manchester/L=MC/CN=harccrm/man2.nw-grid.ac.uk/emailAddress=...

– harcacceptor for Acceptors• /C=UK/O=eScience/OU=Manchester/L=MC

/CN=harcacceptor/man4.nw-grid.ac.uk/emailAddress=....

Installation Procedure

• There’s an installer which installs stuff from the CVS tree - this may change

• HARC environment variable points to the root of the repo (“negotiation” directory)

• You have a subdirectory in– $HARC/rm-service/config

• For example– $HARC/rm-service/config/nw-grid/man2

Installation Procedure

1. Create Contents– install.config - more shortly– grid-mapfile - GT-style mapfile for cert to username

mapping (usually a sym-link to /etc/grid-security/grid-mapfile)

– acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs

– cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix .crt

2. Then a trivial Install– install-rm nw-grid/man2 /usr/local/man2-rm

install.config

RM_INNER_TYPE=SimpleComputeRM_COMPUTE_NODENAME=man2.nw-grid.ac.ukRM_COMPUTE_BATCH_TYPE=TorqueMauiRM_COMPUTE_MEMORY_MB_PER_CPU=4096RM_COMPUTE_CPUS=8RM_MAUI_COMMAND_DIR=/usr/local/maui/binRM_RESOURCE_DESCRIPTION='The Manchester NW-Grid

node, a Dual AMD Opteron Linux cluster’

RM_HOST=130.88.200.242RM_URL=man2-rmRM_PORT=9393

install.configRM_INNER_TYPE=SimpleComputeRM_COMPUTE_NODENAME=man2.nw-grid.ac.ukRM_COMPUTE_BATCH_TYPE=TorqueMauiRM_COMPUTE_MEMORY_MB_PER_CPU=4096RM_COMPUTE_CPUS=8RM_MAUI_COMMAND_DIR=/usr/local/maui/binRM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a

Dual AMD Opteron Linux cluster’

RM_HOST=130.88.200.242RM_URL=man2-rmRM_PORT=9393

<Resource> <Compute>man2.nw-grid.ac.uk</Compute> <Endpoint type=“REST”> <RESTEndpoint>https://man2.nw-grid.ac.uk:9393/man2-rm/</RESTEndpoint> </Endpoint></Resource>

Installation Step

• Before Installing– Need PERL5LIB and LD_LIBRARY_PATH to be

defined in your environment when you install– Or can add these to the config file– Don’t have to set these if you don’t need to

• Then a trivial Install– install-rm nw-grid/man2 /usr/local/man2-rm– Script is in $HARC/rm-service/scripts

• What does this do?

What happens?

rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm

Makefile.crt ... Skipped

cct-ca.crt ... 5fb2fc80.0

old-uk-escience-ca.crt ... 01621954.0

uk-escience-ca.crt ... adcbc9ef.0

uk-escience-root.crt ... 8175c1cd.0

Notice: Don't forget to place your certificate and key files at:

/Users/jonmaclaren/man2-rm/x509/server_cert.pem

/Users/jonmaclaren/man2-rm/x509/server_key.pem

• Installs Source files• Creates a crontab & scripts for restarting the RM• Customizes some scripts for stopping/starting the RM• Installs and hashes CA certificates

• Output:

What’s in /usr/local/man2-rm ?

• Some Perl Modules• And OuterRM.pl which gets run• commands - which configures and runs the RM

(based on install.config, etc.)• rerun - runs “commands” in the background from

crontab• crontab - crontab line which can be added directly to

your crontab (don’t cut and paste!)• start-rm, stop-rm - control whether rerun will actually

start the RM, using a control file (.do_not_restart)– ./stop-rm– ./start-rm [ -w ]

• x509 - subdirectory containing all the CA certs, mapfiles, etc.

Perl Modules

• Just an overview here...• There is a doc online which has some details

on these

Key Modules

• OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ

• MainLoop - handles each request• TransactionManager - remembers what

transactions (by TID) are running, and what their states are

• InnerRM - the main class for different types of RM– SimpleComputeRM– SimpleNetworkRM– Both inherit from InnerRM

SimpleComputeRM• Handles batch queue systems• Deals only with processors/memory• To talk to the scheduler, a subclass of SCBatch is

used– SCBatchTorqueMaui.pm– SCBatchTorqueMoab.pm– SCBatchLoadLeveler.pm - not in CVS yet...

• Chosen at runtime - RM_COMPUTE_BATCH_TYPE• Simple modules

– Less than 200 lines– Override

• initialize• makeReservation• cancelReservation• getStatus

Customizing InnerRM

• Startup/shutdown– initialize/remove

• Parsing (validating) the XML– parseResourceElement– parseWorkElement– maybe parseScheduleElement

• Co-allocation– tryMakeAction– tryCancelAction– addResourceBookings– completeTransactionBookings

• Others for getTimetable/getStatus

Steps for creating a new RM

1. Design your XML• Resource element• Work element

2. Create a new subclass of InnerRM.pm• Use the utility classes where possible

3. To extend the API, create subclasses of• Resource.java• Work.java

Caveats for RMs

• Need to restart to re-read grid-mapfile• When restarted, they forget the

bookings– Want to add persistence so that it’s trivial

for RM developers to utilize

• Thread handling needs work (soon!)

What’s next?

• Discussion on MPIg...• Beer?

But first...

...Any Questions?

Recommended