15
Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Embed Size (px)

Citation preview

Page 1: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor and DRBL

Bruno Gonçalves & Stefan Boettcher

Emory University

Page 2: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Motivation

Maximize computing power while minimizing costs

Optimize the use of the resources that are already available

Maximize resource availabilityPermit peaceful coexistence with

previously existing Operating Systems

Page 3: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Software

Fedora Core Linux http://fedora.redhat.com/Other distributions can be used as well

Diskless Remote Book on Linux (DRBL) http://drbl.sourceforge.net

Condor clustering softweare http://www.cs.wisc.edu/condor/

Page 4: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Hardware

Server (complete machine)Large HDDSeveral network cards

Client (stripped down machine)CPURAMNetwork Card

Page 5: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

DRBL

Uses PXE or Etherboot to let clients boot through the network

All files can be located at the server and accessed via NFS (clients don’t need harddrives!)

Server only provides file sharing and user authentication, all software uses the clients own resources to run

Page 6: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

DRBL Installation (I)

# drblsrv -i Updates the system (similarly to “up2date”,

etc…) Makes sure relevant services (dhcpd, NFS,

NIS, tftpboot, etc..) are installed Configures necessary services Selects the kernel to be used by clients

Page 7: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

DRBL Installation II

# drblpush -i Which network interfaces to use Client booting options (text/gui) How many clients and hostnames MAC address to IP/hostname binding (if any) “Pushes” all the configurations to the clients

(creating new clients if necessary) Needs to be run anytime we want to change the

structure of the cluster

Page 8: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Structure

Internet

DRBL server/FirewallCentral Manager

Compute nodes

192.168.110.x192.168.120.x

Page 9: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Condor Installation

# ./condor_installAll machines share the same password

filesAll filesystems are NFS mounted and

shared between all the machinesConfigure condor for all DRBL clients

even nonexistent ones.

Page 10: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Dedicated Cluster

Number of configured clients can be larger than number of machines (easily add more machines)

Clients boot to text modeCondor configured for dedicated

resources

Page 11: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Windows Computer Lab

Number of nodes should correspond to number of machines

MAC address binding can be used for extra security

Nodes can PXEBoot when they’re available for computation (evening / holidays / vacations) and go back to windows when strictly necessary (morning)

Condor’s checkpointing (and flocking) utilities allow for jobs to be ran in whichever resources are available at a given time

Page 12: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Centralized Cluster management

drbl-doit Run command on all clients

drbl-cp-host, drbl-rm-host cp/rm file or directory to all clients

drbl-useradd, drbl-userdel add/del user accounts

drbl-client-service Control services on clients (drbl-client-service condor

start)

Page 13: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Advantages

Flexible Easily add and remove machines (plug and play) Usable for both dedicated and opportunistic

clustering Stable

Running for months without problems even with nodes being added, removed and upgraded

Both clients and server can be rebooted without (too much) harm

Efficient “Biggest bang for your buck”

Page 14: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

Condor Week 2006

Disadvantages

Not ideal for IO intensive applications (NFS overhead)

Communication between nodes on different subnets are routed through server

All communication with outside world has to go through server

Page 15: Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University

The End

Questions?

Suggestions?