20
Sun Grid Engine

Sun Grid Engine

  • Upload
    michon

  • View
    89

  • Download
    1

Embed Size (px)

DESCRIPTION

Sun Grid Engine. Grids. Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access point; kind of like plugging into an electrical grid • Cluster grids: resources in one room • Campus grids: multiple clusters on one campus - PowerPoint PPT Presentation

Citation preview

Page 1: Sun Grid Engine

Sun Grid Engine

Page 2: Sun Grid Engine

Grids

Grids are collections of resources made available to customers.

Compute grids make cycles available to customers from an access point; kind of like plugging into an electrical grid

• Cluster grids: resources in one room• Campus grids: multiple clusters on one campus

• Global grids: Cross administrative domains

Page 3: Sun Grid Engine

Grids

Potentially (ideally?) you could completely outsource your HPC needs by buying time on a commercial grid. Running a big data center is tricky and takes expensive people. If you are, say, a small computer animation group working on an animated short it might not make sense to set up a data center for six months of work

OTOH, if you’re Pixar or Lucas this is a core competency

Page 4: Sun Grid Engine

Sun Grid Engine

SGE is a piece of software that matches jobs to compute resources

BTW, SGE runs on OS X. This would be another fine project for someone to investigate

Page 5: Sun Grid Engine

SGE

As we’ve seen, Sun Grid Engine can accept a batch job and give it to a compute node.

SGE (base level) is open source; see http://gridengine.sunsource.net/

There are some other issues:• Multiple queues• Giving jobs only to nodes with the necessary resources

• Queue manipulation

Page 6: Sun Grid Engine

SGE

Users submit jobs; they’re kept by SGE in a holding area until resources become available, then sent to an execution device. The results are reported back.

Types of hosts: master, execution, administration, and submit

Master runs the master daemon and scheduling daemon

Execution hosts are where jobs are run, admin hosts can manipulate the queues

There are a lot of knobs to twiddle on SGE

Page 7: Sun Grid Engine

SGE

Imagine a bank that has five customers walk in. Four just want to deposit a check, and the fifth wants to set up a home loan.

If the home loan guy happens to be first, and there is only one queue, the four with short transactions wait for a long time.

What’s more, the home loan guy must have manager approval at some point in the process

So: set up two queues, one for long transactions, one an express lane. The home loan queue specifies that the manager must be available.

This reduces the median time spent in queue for the short transaction customers, and reduces the variance of the waiting time

Page 8: Sun Grid Engine

SGE Queues

There may be more than one queue; jobs are associated with queues

qconf -sql Shows the list of defined queuesWhy multiple queues? Some types of jobs may be very

long or require specific resources, so users may submit jobs to queues optimized for those types of jobs

SGE Master

Q1

Q2

SGE Scheduler

ExecutionHost

ExecutionHost

ExecutionHost

Page 9: Sun Grid Engine

Scheduler

The scheduler (which assigns jobs to execute hosts) looks at several factors:

• Load parameters, how busy the execute hosts are by some measure

• Consumable resources, memory, disk space, licenses, etc. SGE keeps track of these and dispatches a job only if resources are available

• Attributes, such as 64-bit, G5, etc. These aren’t necessarily consumed, but may simply be a state

The scheduler may look at all these factors before assigning a job from the holding pool to an execution host

Page 10: Sun Grid Engine

Consumable Resources

There are some finite resources in the cluster: CPU time, disk space, licenses, bandwidth

Available capacity for these is defined by the administrator; the scheduler examines available consumables when deciding what to run

Page 11: Sun Grid Engine

Requestable Attributes

On job submission you can request attributes or characteristics: at least X amount of memory, a license for software package Y, a 64 bit host, etc.

In a production environment licenses can be a big deal. Circuit design software may cost thousands per node, so not every node on the cluster may have a license.

The attributes can be related to the hosts or the queues

Attributes that are “requestable” can be mentioned in the qsub command, so jobs may require that attribute to run

Page 12: Sun Grid Engine

SGE

You don’t need to submit a job to a specific queue; instead you can simply ask for certain resources, and SGE will pick a queue based on the requirement profile

Page 13: Sun Grid Engine

Environment Variables

When a job runs on a host some environment variables are set:

ARCSGE_ROOTSGE_STDOUT_PATHHOME

Page 14: Sun Grid Engine

Dependencies

Suppose you divide up a task into several subtasks. This can require sequencing--some subtasks may need to be finished before other subtasks can run. You can specify a list of jobs that must finish before this job runs

Page 15: Sun Grid Engine

Listing Attributes

qconf -scl lists “complexes” of attributes. Typically this includes a complex for the queues, and one for the hosts

qconf -sc host|queue Lists attributes for a complex

#name shortcut type value relop requestable consumable default#--------------------------------------------------------------------------------------arch a STRING none == YES NO none num_proc p INT 1 == YES NO 0 load_avg la DOUBLE 99.99 >= NO NO 0

Page 16: Sun Grid Engine

Modifying Attributes

Qconf -mc [complex name] opens up an editor that allows you to modify the complex settings

Page 17: Sun Grid Engine

Attributes

Note that some attributes are “requestable”. This means that you can specify that your job requires that attribute from the qsub command line.

Qsub -l arch=“glinux” says the job requires a “glinux” host to run

Qconf -se compute-0-0 shows resources for a host

Page 18: Sun Grid Engine

Priorities

By default jobs are handled in a FIFO manner. As they come in they are assigned to a compatible queue for processing by the scheduler.

Qsub -p can provide a priority to the job that can override FIFO behavior.

Qdel and qstat to find and delete jobs from the holding area

Page 19: Sun Grid Engine

Checkpointing

Sometimes on very long jobs it is worthwhile to be able to stop the job and restart it later.

What are the issues involved here?Why use it?Starter, suspend, resume, terminate methods

Page 20: Sun Grid Engine

Hard & Soft Requirements

A hard requirement must be present before the job is scheduled