17/09/2004
John Kewley
Grid Technology Group
Introduction to Condor
John Kewley
Grid Technology
17th September 2004
Outline
o What is Condor?
o What can it be used for?
o Status of DL Condor Pool(s)
John Kewley
Grid Technology
17th September 2004
What is Condor?
o A job submission framework which utilises spare computing power within a heterogeneous computer network (Condor pool)
o It supports High-Throughput Computing (HTC), maximising the amount of processing capacity that is utilised over long periods of time.
o Developed over many years at University of Wisconsin – Madison
John Kewley
Grid Technology
17th September 2004
Basic Features
o A Condor pool is a set of resources (clusters, servers and networked workstations), managed by a Central Manager
o The Central Manager matches requests for resources with those resources available within the pool
o User does not need account on machine where job runs, but may submit jobs to the pool from his/her workstation.
o Highly extensible resource description and job requirements language which is used to classify/advertise the resources in the pool.
o Available on multiple platforms.
John Kewley
Grid Technology
17th September 2004
Supported platforms
Architecture Operating System Hewlett Packard PA-RISC (PA7000 + PA8000) HPUX 10.20
Sun SPARC Sun4m,Sun4c, Sun UltraSPARC Solaris 2.6, 2.7, 2.8, 2.9
Silicon Graphics MIPS (R5000, R8000, R10000)
IRIX 6.5 (clipped)
Intel x86 Red Hat Linux 7.1, 7.2, 7.3, 8.0
Red Hat Linux 9
Windows 2000 Prof + Server, 2003 Server (clipped)
Windows XP Professional (clipped)
ALPHA Digital Unix 4.0
Red Hat Linux 7.1, 7.2, 7.3 (clipped)
Tru64 5.1 (clipped)
PowerPC Macintosh OS X (clipped)
AIX 5.2L (clipped)
Itanium Red Hat Linux 7.1, 7.2, 7.3 (clipped)
SuSE Linux Enterprise 8.1 (clipped)
John Kewley
Grid Technology
17th September 2004
Execute MachineSubmit Machine
Job Startup
Submit
Schedd
Starter Job
Shadow CondorSyscall Lib
Startd
Central Manager
CollectorNegotiator
Slide courtesy of University of Wisconsin-Madison
John Kewley
Grid Technology
17th September 2004
Additional Features
o Checkpointing and migration of jobso Shared filestore is not required, but can be utilisedo Interworking with Globus,o Security: GSI, Kerberoso Use of MPI and PVMo Workflow using DAGMan (Directed Acyclic Graph
Manager).o Windows + Unix + Linux + …
John Kewley
Grid Technology
17th September 2004
Execution Environments
standard
o Must be relinked under condoro System calls occur on the
submitting resourceo Jobs may checkpoint and hence
be stopped and later restarted from its last checkpoint, and may migrate to another resource
o Not available on some platforms (e.g. Windows)
o Some restrictions on what can be run.
vanilla
o Any executable or script, no need for relinking or access to object files
o System calls happen on the executing resource
o No checkpointing, not so good for long-running jobs. If a job is stopped it will be rescheduled (i.e. compute time is lost).
o Works on all supported platforms (incl Windows)
o Some opening of file permissions may be required
John Kewley
Grid Technology
17th September 2004
Possible Uses
o Use vanilla universe for jobs which comprise many small (comparatively), independent tasks.
o Use standard universe for jobs which will run for long periods.
o Utilise the “odds and ends” of the pool for compilation and build tests.
John Kewley
Grid Technology
17th September 2004
Condor Pools at DLo Internal Pool
5 Windows• 3x Windows XP Professional• 2x Windows 2000 Professional
18 Linux• 6x SuSE Linux 9.0• 2x SuSE Linux 8.0• 5x White Box Enterprise Linux 3.0• 3x Red Hat Linux 9• 1x Mandrake Linux 10.0• 1x Gentoo Linux 1.4
o External Pool 6 Linux
• 2x Red Hat Linux 7.3• 4x White Box Enterprise Linux 3.0
John Kewley
Grid Technology
17th September 2004
Build and Test
o Our External Pool is being used by the OMII (Open Middleware Infrastructure Institute) for building and testing their latest Grid middleware.
o We intend extending the use of this pool for use as a build and test pool for other Institutions on the UK Grid.
o Our internal users are also keen to utilise this build technology to build release packages of their software for many different platforms.
John Kewley
Grid Technology
17th September 2004
User Status
We are currently at an early stage with our user community and are helping them setup their code so that it can be run conveniently under Condor.
These users are from the following computational science communities:
o CCP1 - The electronic structure of moleculeso CCP4 - Protein crystallography
John Kewley
Grid Technology
17th September 2004
Summary
o Condor can utilise otherwise unused resources (e.g. Windows workstations overnight)
o Use vanilla universe for jobs which comprise many small (comparatively), independent tasks
o Use standard universe for jobs which will run for long periods (although not on Windows)
o Can be used for compilation and build tests