Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs

Parallel Computing

• The Bad News– Hardware is not getting faster fast enough– Too many architectures– Existing architectures are too specific– Programs closely tied to architecture– Software is being developed using 50’s

mentality

Computing Trends

• Centralized systems are a thing of the past– Evolving towards cycle servers

• Each user has their own computer

• Workstations are networked– Typical LAN speeds are 100mbs

• For some a single workstation does not provide adequate computing power

A Solution

• A virtual computing environment– Utilize existing software to build a programming model

that can be used to develop distributed and parallel applications

– Provide tools to create, debug, and execute applications on heterogeneous hardware

– Let the software map high level descriptions of the problems to available hardware

– Programmer will no longer need to be concerned with low-level issues

Other Names

• For many scientists, it is not uncommon to find problems that require weeks or months of computation to solve.– Such an environment is called a High Throughput

Computing (HTC) environment– Scientists involved in this type of research need a

computing environment that delivers large amounts of computational power over a long period of time

• In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of power over a short period of time.

Workstation Users

• All VCE configuration include some workstations• Workstations are chronically underutilized• Workstation users can be classified as follows:

– Casual Users

– Sporadic Users

– Frustrated Users

• The VCE must help frustrated users without hurting casual and sporadic users

Other Considerations

• The VCE must be cost effective– Use existing tools like NFS, ISIS, PVM, MPI whenever

possible

– Must not require tremendous amounts of processor power

• The VCE must coexist with other software– Non-VCE applications should not be impacted by the

VCE

• The VCE must avoid kernel modes

Users View of the VCE

• The software development module (SDM) provides tools to build and annotate an application task graph

• The Execution module (EXM) compiles the application and dispatches the tasks

The VCEProblem Specification

Design Stage

Coding Level

Compilation Manager

Runtime Manager

SDM

EXM

Runtime Issues

• Compilation Issues– Executables must be prepared to maximize

scheduling flexibility– Compilations must be scheduled to maximize

application performance and hardware utilization

– Java?

Runtime Issues

• Task Placement– The criteria for selecting machines to host tasks

must consider both hardware utilization and application throughput

– Hints supplied by the programmer might improve task placement decisions

Processor Utilization

• Free Parallelism– Parallel applications with low efficiency benefit

when run on idle machines

• Anticipatory Processing– Use idle resources to perform work which may

be useful if certain schedules are ultimately executed

Load Balancing

• Central issue in the execution module

• Good application throughput must be achieved without impacting interactive users

• Many systems provide the ability to migrate tasks

Task Migration

• Various migration strategies are possible– Redundant execution– Check-pointing– Dump and migrate– Recompilation– Byte coded tasks

Systems

• Many systems are available which provide some form of a VCE– PVM– MPI– Beowulf– Condor– …

The Berkeley Now Project

Condor

• Condor is a software system that runs on a cluster of workstations to harness wasted CPU cycles.– A Condor pool consists of any number of machines, of

possibly different architectures and operating systems, that are connected by a network

• To monitor the status of the individual computers in the cluster, Condor "daemons" must run all the time. – One daemon is called the "master". Its only job is to

make sure that the rest of the Condor daemons are running.

Idle Machines Only

• Two other daemons run on every machine in the pool: startd and schedd

• Startd monitors information about the machine that is used to decide if it is available to run a Condor job– keyboard and mouse activity

– load on the CPU

– startd also notices when a user returns to a machine that is currently running and removes the job.

Condor Architecture

Condor Executables

• Code does not have to be modified in any way to be used in Condor– it must be linked with the Condor libraries

• Once re-linked, jobs gain two crucial abilities:– Checkpoint– Perform remote system calls

• Condor also provides a mechanism to run binaries that have not been re-linked, which are called "vanilla" jobs

Condor Executables

Condor Tricks

• Match Making– When a task is submitted to Condor, the system finds a

machine that matches the resources required by the task

• Condor uses check-pointing to migrate jobs– You only loose the computation that has been

performed since the last checkpoint

• Condor tasks move around to find the under utilized workstations

Beowulf

• The Beowulf parallel workstation is a single user multiple computer with direct access keyboard and monitors. Beowulf comprises: – 16 motherboards with Intel x86 processors – 256 Mbytes of DRAM, 16 MByte per processor board – 16 hard disk drives and controllers – 2 Ethernets and controllers per processor – 2 high res monitors with controllers and 1 keyboard

• The Beowulf architecture is a fully COTS (Commodity Off The Shelf) configured system.

Documents

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs