Upload
duane-ford
View
216
Download
2
Embed Size (px)
Citation preview
Condor and DRBL
Bruno Gonçalves & Stefan Boettcher
Emory University
Condor Week 2006
Motivation
Maximize computing power while minimizing costs
Optimize the use of the resources that are already available
Maximize resource availabilityPermit peaceful coexistence with
previously existing Operating Systems
Condor Week 2006
Software
Fedora Core Linux http://fedora.redhat.com/Other distributions can be used as well
Diskless Remote Book on Linux (DRBL) http://drbl.sourceforge.net
Condor clustering softweare http://www.cs.wisc.edu/condor/
Condor Week 2006
Hardware
Server (complete machine)Large HDDSeveral network cards
Client (stripped down machine)CPURAMNetwork Card
Condor Week 2006
DRBL
Uses PXE or Etherboot to let clients boot through the network
All files can be located at the server and accessed via NFS (clients don’t need harddrives!)
Server only provides file sharing and user authentication, all software uses the clients own resources to run
Condor Week 2006
DRBL Installation (I)
# drblsrv -i Updates the system (similarly to “up2date”,
etc…) Makes sure relevant services (dhcpd, NFS,
NIS, tftpboot, etc..) are installed Configures necessary services Selects the kernel to be used by clients
Condor Week 2006
DRBL Installation II
# drblpush -i Which network interfaces to use Client booting options (text/gui) How many clients and hostnames MAC address to IP/hostname binding (if any) “Pushes” all the configurations to the clients
(creating new clients if necessary) Needs to be run anytime we want to change the
structure of the cluster
Condor Week 2006
Structure
Internet
DRBL server/FirewallCentral Manager
Compute nodes
192.168.110.x192.168.120.x
Condor Week 2006
Condor Installation
# ./condor_installAll machines share the same password
filesAll filesystems are NFS mounted and
shared between all the machinesConfigure condor for all DRBL clients
even nonexistent ones.
Condor Week 2006
Dedicated Cluster
Number of configured clients can be larger than number of machines (easily add more machines)
Clients boot to text modeCondor configured for dedicated
resources
Condor Week 2006
Windows Computer Lab
Number of nodes should correspond to number of machines
MAC address binding can be used for extra security
Nodes can PXEBoot when they’re available for computation (evening / holidays / vacations) and go back to windows when strictly necessary (morning)
Condor’s checkpointing (and flocking) utilities allow for jobs to be ran in whichever resources are available at a given time
Condor Week 2006
Centralized Cluster management
drbl-doit Run command on all clients
drbl-cp-host, drbl-rm-host cp/rm file or directory to all clients
drbl-useradd, drbl-userdel add/del user accounts
drbl-client-service Control services on clients (drbl-client-service condor
start)
Condor Week 2006
Advantages
Flexible Easily add and remove machines (plug and play) Usable for both dedicated and opportunistic
clustering Stable
Running for months without problems even with nodes being added, removed and upgraded
Both clients and server can be rebooted without (too much) harm
Efficient “Biggest bang for your buck”
Condor Week 2006
Disadvantages
Not ideal for IO intensive applications (NFS overhead)
Communication between nodes on different subnets are routed through server
All communication with outside world has to go through server
The End
Questions?
Suggestions?