20
Co-allocation Using HARC IV. ResourceManagers HARC Workshop University of Manchester

Co-allocation Using HARC IV. ResourceManagers

  • Upload
    lacy

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Co-allocation Using HARC IV. ResourceManagers. HARC Workshop University of Manchester. Philosophy. New types of RMs can be written by others Existing RMs can be customized Interfaces can be enhanced or changed None of this means changing the acceptor code API is extensible too - PowerPoint PPT Presentation

Citation preview

Page 1: Co-allocation Using HARC IV. ResourceManagers

Co-allocation Using HARCIV. ResourceManagers

HARC WorkshopUniversity of Manchester

Page 2: Co-allocation Using HARC IV. ResourceManagers

Philosophy• New types of RMs can be written by others• Existing RMs can be customized• Interfaces can be enhanced or changed• None of this means changing the acceptor code• API is extensible too

• Good community contribution model• CCT keeps control of the acceptor code• The acceptor code will become very stable

(already less than one commit per month)• The community evolves the system

Page 3: Co-allocation Using HARC IV. ResourceManagers

Are RMs Easy to Install

• Harder than client software• Much easier than Acceptors

• Complexity is in the right place:– Only a few people install and configure Acceptors

(infrastructure), which is hard– Some people modify/write RMs, which is not too

hard– More people install and configure RMs which is

easy– Many people install and configure the Client

software, which is trivial

Page 4: Co-allocation Using HARC IV. ResourceManagers

Pre-installation - Perl

• RMs are written in perl, to make installation trivial• However, they need a large number of CPAN

modules to be installed• Some of these, e.g. Net::SSLeay and

Crypt::SSLeay are not trivial

• There is a document which contains things to watch out for– Lists previously seen problems, with solutions– Basically a list of exceptions– Now 7 pages of text!– There’s a lot of AIX content...

Page 5: Co-allocation Using HARC IV. ResourceManagers

Pre-installation - Certificate

• HARC RM needs a certificate• We don’t recommend re-using the host certificate• Get a service certificate

• UK e-Science CA now supports:– harccrm for Compute RMs (CRMs)

• /C=UK/O=eScience/OU=Manchester/L=MC/CN=harccrm/man2.nw-grid.ac.uk/emailAddress=...

– harcacceptor for Acceptors• /C=UK/O=eScience/OU=Manchester/L=MC

/CN=harcacceptor/man4.nw-grid.ac.uk/emailAddress=....

Page 6: Co-allocation Using HARC IV. ResourceManagers

Installation Procedure

• There’s an installer which installs stuff from the CVS tree - this may change

• HARC environment variable points to the root of the repo (“negotiation” directory)

• You have a subdirectory in– $HARC/rm-service/config

• For example– $HARC/rm-service/config/nw-grid/man2

Page 7: Co-allocation Using HARC IV. ResourceManagers

Installation Procedure

1. Create Contents– install.config - more shortly– grid-mapfile - GT-style mapfile for cert to username

mapping (usually a sym-link to /etc/grid-security/grid-mapfile)

– acceptor_mapfile - a list of the Acceptor DNs, and also their CA cert DNs

– cacerts directory, containing CA Certs for your cert and the Acceptor certs, in PEM format, suffix .crt

2. Then a trivial Install– install-rm nw-grid/man2 /usr/local/man2-rm

Page 8: Co-allocation Using HARC IV. ResourceManagers

install.config

RM_INNER_TYPE=SimpleComputeRM_COMPUTE_NODENAME=man2.nw-grid.ac.ukRM_COMPUTE_BATCH_TYPE=TorqueMauiRM_COMPUTE_MEMORY_MB_PER_CPU=4096RM_COMPUTE_CPUS=8RM_MAUI_COMMAND_DIR=/usr/local/maui/binRM_RESOURCE_DESCRIPTION='The Manchester NW-Grid

node, a Dual AMD Opteron Linux cluster’

RM_HOST=130.88.200.242RM_URL=man2-rmRM_PORT=9393

Page 9: Co-allocation Using HARC IV. ResourceManagers

install.configRM_INNER_TYPE=SimpleComputeRM_COMPUTE_NODENAME=man2.nw-grid.ac.ukRM_COMPUTE_BATCH_TYPE=TorqueMauiRM_COMPUTE_MEMORY_MB_PER_CPU=4096RM_COMPUTE_CPUS=8RM_MAUI_COMMAND_DIR=/usr/local/maui/binRM_RESOURCE_DESCRIPTION='The Manchester NW-Grid node, a

Dual AMD Opteron Linux cluster’

RM_HOST=130.88.200.242RM_URL=man2-rmRM_PORT=9393

<Resource> <Compute>man2.nw-grid.ac.uk</Compute> <Endpoint type=“REST”> <RESTEndpoint>https://man2.nw-grid.ac.uk:9393/man2-rm/</RESTEndpoint> </Endpoint></Resource>

Page 10: Co-allocation Using HARC IV. ResourceManagers

Installation Step

• Before Installing– Need PERL5LIB and LD_LIBRARY_PATH to be

defined in your environment when you install– Or can add these to the config file– Don’t have to set these if you don’t need to

• Then a trivial Install– install-rm nw-grid/man2 /usr/local/man2-rm– Script is in $HARC/rm-service/scripts

• What does this do?

Page 11: Co-allocation Using HARC IV. ResourceManagers

What happens?

rm-service $ scripts/install-rm nw-grid/man2 /Users/jonmaclaren/man2-rm

Makefile.crt ... Skipped

cct-ca.crt ... 5fb2fc80.0

old-uk-escience-ca.crt ... 01621954.0

uk-escience-ca.crt ... adcbc9ef.0

uk-escience-root.crt ... 8175c1cd.0

Notice: Don't forget to place your certificate and key files at:

/Users/jonmaclaren/man2-rm/x509/server_cert.pem

/Users/jonmaclaren/man2-rm/x509/server_key.pem

• Installs Source files• Creates a crontab & scripts for restarting the RM• Customizes some scripts for stopping/starting the RM• Installs and hashes CA certificates

• Output:

Page 12: Co-allocation Using HARC IV. ResourceManagers

What’s in /usr/local/man2-rm ?

• Some Perl Modules• And OuterRM.pl which gets run• commands - which configures and runs the RM

(based on install.config, etc.)• rerun - runs “commands” in the background from

crontab• crontab - crontab line which can be added directly to

your crontab (don’t cut and paste!)• start-rm, stop-rm - control whether rerun will actually

start the RM, using a control file (.do_not_restart)– ./stop-rm– ./start-rm [ -w ]

• x509 - subdirectory containing all the CA certs, mapfiles, etc.

Page 13: Co-allocation Using HARC IV. ResourceManagers

Perl Modules

• Just an overview here...• There is a doc online which has some details

on these

Page 14: Co-allocation Using HARC IV. ResourceManagers

Key Modules

• OuterRM - just does the HTTP listening and Acceptor Cert authN/authZ

• MainLoop - handles each request• TransactionManager - remembers what

transactions (by TID) are running, and what their states are

• InnerRM - the main class for different types of RM– SimpleComputeRM– SimpleNetworkRM– Both inherit from InnerRM

Page 15: Co-allocation Using HARC IV. ResourceManagers

SimpleComputeRM• Handles batch queue systems• Deals only with processors/memory• To talk to the scheduler, a subclass of SCBatch is

used– SCBatchTorqueMaui.pm– SCBatchTorqueMoab.pm– SCBatchLoadLeveler.pm - not in CVS yet...

• Chosen at runtime - RM_COMPUTE_BATCH_TYPE• Simple modules

– Less than 200 lines– Override

• initialize• makeReservation• cancelReservation• getStatus

Page 16: Co-allocation Using HARC IV. ResourceManagers

Customizing InnerRM

• Startup/shutdown– initialize/remove

• Parsing (validating) the XML– parseResourceElement– parseWorkElement– maybe parseScheduleElement

• Co-allocation– tryMakeAction– tryCancelAction– addResourceBookings– completeTransactionBookings

• Others for getTimetable/getStatus

Page 17: Co-allocation Using HARC IV. ResourceManagers

Steps for creating a new RM

1. Design your XML• Resource element• Work element

2. Create a new subclass of InnerRM.pm• Use the utility classes where possible

3. To extend the API, create subclasses of• Resource.java• Work.java

Page 18: Co-allocation Using HARC IV. ResourceManagers

Caveats for RMs

• Need to restart to re-read grid-mapfile• When restarted, they forget the

bookings– Want to add persistence so that it’s trivial

for RM developers to utilize

• Thread handling needs work (soon!)

Page 19: Co-allocation Using HARC IV. ResourceManagers

What’s next?

• Discussion on MPIg...• Beer?

Page 20: Co-allocation Using HARC IV. ResourceManagers

But first...

...Any Questions?