36
Using Parallel Computing Resources at Marquette

Using Parallel Computing Resources at Marquette

  • Upload
    giona

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Using Parallel Computing Resources at Marquette. HPC Resources. Local Resources HPCL Cluster hpcl.mscs.mu.edu PARIO Clusterpario.eng.mu.edu PERE Cluster pere.marquette.edu MU Grid Regional Resources Milwaukee Institute SeWhip National Resources - PowerPoint PPT Presentation

Citation preview

Page 1: Using Parallel Computing Resources at Marquette

Using Parallel Computing Resources at Marquette

Page 2: Using Parallel Computing Resources at Marquette

HPC Resources• Local Resources

– HPCL Cluster hpcl.mscs.mu.edu– PARIO Cluster pario.eng.mu.edu– PERE Cluster pere.marquette.edu– MU Grid

• Regional Resources– Milwaukee Institute– SeWhip

• National Resources– NCSA http://www.ncsa.illinois.edu/– ANL http://www.anl.gov/– TeraGrid Resources http://www.teragrid.org/

• Commercial Resources– Amazon EC2 http://aws.amazon.com/ec2/

Page 3: Using Parallel Computing Resources at Marquette

Pere Cluster

Head Node

Compute Node #1

Compute Node #2

Compute Node #3

Compute Node #128Giga

bit E

ther

net I

nter

conn

ectio

n

To MARQNET134.48.

10..1

.0.0

/16

Infin

iban

d In

terc

onne

ction

172.

16.0

.0/1

6

128 HP ProLiant BL280c G6 Server Blade1024 Intel Xeon 5550 Cores (Nehalem) 50 TB raw storage3 TB main memory

Page 4: Using Parallel Computing Resources at Marquette

Steps to Run A Parallel Code

1. Get the source code– You can do it either on your local computer and

then transfer to hpcl.mscs.mu.edu, or– Use vi to edit a new one on hpcl.mscs.mu.edu

2. Compile your source code using mpicc, mpicxx or mpif77

3. Write a submission script for your job– vi myscript.sh

4. Use qsub to submit the script.– qsub myscript.sh

Page 5: Using Parallel Computing Resources at Marquette

Getting Parallel Codehello.c

You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended)

For small code, you can also directly edit it on the cluster.

Page 6: Using Parallel Computing Resources at Marquette

Transfer File to Cluster

• Method 1: sftp (text or GUI)sftp [email protected] simple.cbye

• Method 2: scpscp simple.c [email protected]:example/

• Method 3: rsyncrsync -rsh=ssh -av example \ [email protected]:

Page 7: Using Parallel Computing Resources at Marquette

Compile MPI Programs

• Method 1: Using MPI compiler wrappers– mpicc: for c code– mpicxx/mpic++/mpiCC: for c++ code– mpif77, mpif90: for FORTRAN codeExamples:

mpicc –o hello hello.c mpif90 –o hello hello.f

Page 8: Using Parallel Computing Resources at Marquette

Compile MPI Programs (cont.)

• Method 2: Using standard compilers with mpi library

Note: MPI is just a library, so you can link the library to your code to get the executables.

Examples: gcc -o ping ping.c \ -I/usr/mpi/gcc/openmpi-1.2.8/include \ -L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi

Page 9: Using Parallel Computing Resources at Marquette

Compiling Parallel Code – Using Makefile

Page 10: Using Parallel Computing Resources at Marquette

Job Scheduler• A kind of software that

provide– Job submission and

automatic execution– Job monitoring and control– Resource management– Priority management– Checkpoint– ….

• Usually implemented as master/slave architecture

Commonly used Job Schedulers PBS: PBS Pro/TORQUE SGE (Sun Grid Engine, Oracle) LSF (Platform Computing) Condor (UW Madison)

Page 11: Using Parallel Computing Resources at Marquette

Access the Pere Cluster

• Access– ssh <your-marquette-id>@pere.marquette.edu

• Account management– Based on Active Directory, you use the same

username and password to login Pere as the one you are using for your Marquette email.

– Need your professor to help you sign up.• Transfer files from/to Pere

Page 12: Using Parallel Computing Resources at Marquette

Modules• The Modules package is used to customize your

environment settings. – control what versions of a software package will be used

when you compile or run a program.• Using modules– module avail check which modules are available – module load <module> set up shell variables to use a module– module unload remove a module– module list show all loaded modules– module help get help on using module

Page 13: Using Parallel Computing Resources at Marquette

Using MPI on Pere• Multiple MPI compilers available, each may need different

syntax– OpenMPI compiler (/usr/mpi/gcc/openmpi-1.2.8)

• mpicc –o prog prog.c• mpif90 –o prog prog.f

– mvapich compiler (/usr/mpi/gcc/mvapich-1.1.0)• mpicc –o prog prog.c• mpif90 –o prog prog.f

– PGI compiler (/cluster/pgi/linux86-64/10.2)• pgcc –Mmpi –o prog prog.c• pgf90 –Mmpi –o prog prog.f

– Intel compiler• icc –o prog prog.c –lmpi• ifort –o prog prog.f -lmpi

Page 14: Using Parallel Computing Resources at Marquette

Pere Batch Queues

• Pere current runs PBS/TORQUE• TORQUE usage– qsub myjob.qsub submit job scripts– qstat view job status– qdel job-id delete job– pbsnodes show nodes status– pbstop show queue status

Page 15: Using Parallel Computing Resources at Marquette

Sample Job Scripts on Pere• #!/bin/sh

• #PBS -N hpl• #PBS -l nodes=64:ppn=8,walltime=01:00:00• #PBS -q batch• #PBS -j oe• #PBS -o hpl-$PBS_JOBID.log

• cd $PBS_O_WORKDIR• cat $PBS_NODEFILE

• mpirun -np 512 --hostfile `echo $PBS_NODEFILE` xhpl

Assign a name to the jobRequest resources: 64 nodes, each with 8 processors, 1 hour Submit to batch queue Merge stdout and stderr outputRedirect output to a file

Change work dir to current dirPrint allocated nodes (not required)

Run the mpi program

Page 16: Using Parallel Computing Resources at Marquette

Extra Help For Accessing Pere

• Contact me.• User’s guide for pere

Page 17: Using Parallel Computing Resources at Marquette

Using Condor

• Resources:– http://www.cs.wisc.edu/condor/tutorials/

Page 18: Using Parallel Computing Resources at Marquette

Using Condor

Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.out Error = simple.error Queue

1. Write a submit script – simple.job

2. Submit the script to condor pool condor_submit simple.job

3. Watch the job run condor_q condor_q –sub <you-username>

Page 19: Using Parallel Computing Resources at Marquette

Doing a Parameter Sweep

Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.$(Process).out Error = simple.$(Process).error Queue

Arguments = 4 11 Queue

Arguments = 4 12 Queue

Can put a collections of jobs in the same submit scripts to do a parameter sweep.

Tell condor to use different output for each job

Use queue to tell the individual jobs Can be run independently

Page 20: Using Parallel Computing Resources at Marquette

Condor DAGManDAGMAn, lets you submit complex sequences of jobs as long as they can be expressed as a directed acylic graph

Commands: condor_submit_dag simple.dag./watch_condor_q

Each job in the DAG can only one queue.

Page 21: Using Parallel Computing Resources at Marquette

Submit MPI Jobs to Condor

Difference from serial jobs: use MPI universe machine_count > 1

When there is no shared file system, transfer executables and output from/to local systems by specifying should_transfer_file and when_to_transfer_output

Page 22: Using Parallel Computing Resources at Marquette

Questions

• How to implement parameter sweep using SGE/PBS?

• How to implement DAG on SGE/PBS?• Is there better ways to run the a large number

of jobs on the cluster?• Which resource I should use and where I can

find help?

Page 23: Using Parallel Computing Resources at Marquette

HPCL Cluster

Head Node

Compute Node #1

Compute Node #2

Compute Node #3

Compute Node #4Giga

bit E

ther

net I

nter

conn

ectio

n

To MARQNET134.48.

10..1

.0.0

/16

Page 24: Using Parallel Computing Resources at Marquette

How to Access HPCL ClusterOn Windows: Using SSH Secure Shell or PUTTY

On Linux: Using ssh command

Page 25: Using Parallel Computing Resources at Marquette

Developing & Running Parallel Code Identify Problem & Analyze

Requirement

Analyze Performance Bottleneck

Designing Parallel Algorithm

Writing Parallel Code

Building Binary Code (Compiling)

Testing Code

Solving Realistic Problems(Running Production Release)

Coding

Compiling

Running

Page 26: Using Parallel Computing Resources at Marquette

Steps to Run A Parallel Code1. Get the source code

– You can do it either on your local computer and then transfer to hpcl.mscs.mu.edu, or

– Use vi to edit a new one on hpcl.mscs.mu.edu2. Compile your source code using

mpicc, mpicxx or mpif77They are located under /opt/openmpi/bin. Use which command to find it

location; If not in your path, add the next line to your shell initialization file (e.g.,

~/.bash_profile)export PATH=/opt/openmpi/bin:$PATH

3. Write a submission script for your job– vi myscript.sh

4. Use qsub to submit the script.– qsub myscript.sh

Page 27: Using Parallel Computing Resources at Marquette

Getting Parallel Codehello.c

You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended)

For small code, you can also directly edit it on the cluster.

Page 28: Using Parallel Computing Resources at Marquette

Transfer File to Cluster• Method 1: sftp (text or GUI)

sftp [email protected] simple.cbye

• Method 2: scpscp simple.c [email protected]:example/

• Method 3: rsyncrsync -rsh=ssh -av example \ [email protected]:

• Method 4: svn or cvssvn co \ svn+ssh://hpcl.mscs.mu.edu/mscs6060/example

Page 29: Using Parallel Computing Resources at Marquette

Compile MPI Programs

• Method 1: Using MPI compiler wrappers– mpicc: for c code– mpicxx/mpic++/mpiCC: for c++ code– mpif77, mpif90: for FORTRAN codeExamples:

mpicc –o hello hello.c mpif90 –o hello hello.fLooking the cluster documentation or consulting system

administrators for the types of available compilers and their locations.

Page 30: Using Parallel Computing Resources at Marquette

Compile MPI Programs (cont.)

• Method 2: Using standard compilers with mpi library

Note: MPI is just a library, so you can link the library to your code to get the executables.

Examples: gcc -o ping ping.c \ -I/usr/mpi/gcc/openmpi-1.2.8/include \ -L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi

Page 31: Using Parallel Computing Resources at Marquette

Compiling Parallel Code – Using Makefile

Page 32: Using Parallel Computing Resources at Marquette

Job Scheduler• A kind of software that

provide– Job submission and

automatic execution– Job monitoring and control– Resource management– Priority management– Checkpoint– ….

• Usually implemented as master/slave architecture

Commonly used Job Schedulers PBS: PBS Pro/TORQUE SGE (Sun Grid Engine, Oracle) LSF (Platform Computing) Condor (UW Madison)

Page 33: Using Parallel Computing Resources at Marquette

Using SGE to Manage Jobs

• HPCL cluster using SGE as job scheduler• Basic commands– qsub submit a job to the batch scheduler – qstat examine the job queue – qdel delete a job from the queue

• Other commands– qconf SGE queue configuration– qmon graphical user's interface for SGE– qhost show the status of SGE hosts, queues, jobs

Page 34: Using Parallel Computing Resources at Marquette

Submit a Serial Jobsimple.sh

Page 35: Using Parallel Computing Resources at Marquette

Submit Parallel Jobs to HPCL Cluster

force to use bash for shell interpreterRequest Parallel Environment orte using 64 slots (or processors)Run the job in specified directorMerge two output files (stdout, stderr)Redirect output to a log file

Run mpi program

For your program, you may need to change the processor number, the program name at the last line, and the job names.

Page 36: Using Parallel Computing Resources at Marquette

References

• SUN Grid Engine User’s Guide http://docs.sun.com/app/docs/doc/817-6117

• Command used commands– Submit job: qsub– Check status: qstat– Delete job: qdel– Check configuration: qconf

• Check the manual of a command– man qsub