Upload
giona
View
57
Download
0
Embed Size (px)
DESCRIPTION
Using Parallel Computing Resources at Marquette. HPC Resources. Local Resources HPCL Cluster hpcl.mscs.mu.edu PARIO Clusterpario.eng.mu.edu PERE Cluster pere.marquette.edu MU Grid Regional Resources Milwaukee Institute SeWhip National Resources - PowerPoint PPT Presentation
Citation preview
Using Parallel Computing Resources at Marquette
HPC Resources• Local Resources
– HPCL Cluster hpcl.mscs.mu.edu– PARIO Cluster pario.eng.mu.edu– PERE Cluster pere.marquette.edu– MU Grid
• Regional Resources– Milwaukee Institute– SeWhip
• National Resources– NCSA http://www.ncsa.illinois.edu/– ANL http://www.anl.gov/– TeraGrid Resources http://www.teragrid.org/
• Commercial Resources– Amazon EC2 http://aws.amazon.com/ec2/
Pere Cluster
Head Node
Compute Node #1
Compute Node #2
Compute Node #3
Compute Node #128Giga
bit E
ther
net I
nter
conn
ectio
n
To MARQNET134.48.
10..1
.0.0
/16
Infin
iban
d In
terc
onne
ction
172.
16.0
.0/1
6
128 HP ProLiant BL280c G6 Server Blade1024 Intel Xeon 5550 Cores (Nehalem) 50 TB raw storage3 TB main memory
Steps to Run A Parallel Code
1. Get the source code– You can do it either on your local computer and
then transfer to hpcl.mscs.mu.edu, or– Use vi to edit a new one on hpcl.mscs.mu.edu
2. Compile your source code using mpicc, mpicxx or mpif77
3. Write a submission script for your job– vi myscript.sh
4. Use qsub to submit the script.– qsub myscript.sh
Getting Parallel Codehello.c
You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended)
For small code, you can also directly edit it on the cluster.
Transfer File to Cluster
• Method 1: sftp (text or GUI)sftp [email protected] simple.cbye
• Method 2: scpscp simple.c [email protected]:example/
• Method 3: rsyncrsync -rsh=ssh -av example \ [email protected]:
Compile MPI Programs
• Method 1: Using MPI compiler wrappers– mpicc: for c code– mpicxx/mpic++/mpiCC: for c++ code– mpif77, mpif90: for FORTRAN codeExamples:
mpicc –o hello hello.c mpif90 –o hello hello.f
Compile MPI Programs (cont.)
• Method 2: Using standard compilers with mpi library
Note: MPI is just a library, so you can link the library to your code to get the executables.
Examples: gcc -o ping ping.c \ -I/usr/mpi/gcc/openmpi-1.2.8/include \ -L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi
Compiling Parallel Code – Using Makefile
Job Scheduler• A kind of software that
provide– Job submission and
automatic execution– Job monitoring and control– Resource management– Priority management– Checkpoint– ….
• Usually implemented as master/slave architecture
Commonly used Job Schedulers PBS: PBS Pro/TORQUE SGE (Sun Grid Engine, Oracle) LSF (Platform Computing) Condor (UW Madison)
Access the Pere Cluster
• Access– ssh <your-marquette-id>@pere.marquette.edu
• Account management– Based on Active Directory, you use the same
username and password to login Pere as the one you are using for your Marquette email.
– Need your professor to help you sign up.• Transfer files from/to Pere
Modules• The Modules package is used to customize your
environment settings. – control what versions of a software package will be used
when you compile or run a program.• Using modules– module avail check which modules are available – module load <module> set up shell variables to use a module– module unload remove a module– module list show all loaded modules– module help get help on using module
Using MPI on Pere• Multiple MPI compilers available, each may need different
syntax– OpenMPI compiler (/usr/mpi/gcc/openmpi-1.2.8)
• mpicc –o prog prog.c• mpif90 –o prog prog.f
– mvapich compiler (/usr/mpi/gcc/mvapich-1.1.0)• mpicc –o prog prog.c• mpif90 –o prog prog.f
– PGI compiler (/cluster/pgi/linux86-64/10.2)• pgcc –Mmpi –o prog prog.c• pgf90 –Mmpi –o prog prog.f
– Intel compiler• icc –o prog prog.c –lmpi• ifort –o prog prog.f -lmpi
Pere Batch Queues
• Pere current runs PBS/TORQUE• TORQUE usage– qsub myjob.qsub submit job scripts– qstat view job status– qdel job-id delete job– pbsnodes show nodes status– pbstop show queue status
Sample Job Scripts on Pere• #!/bin/sh
• #PBS -N hpl• #PBS -l nodes=64:ppn=8,walltime=01:00:00• #PBS -q batch• #PBS -j oe• #PBS -o hpl-$PBS_JOBID.log
• cd $PBS_O_WORKDIR• cat $PBS_NODEFILE
• mpirun -np 512 --hostfile `echo $PBS_NODEFILE` xhpl
Assign a name to the jobRequest resources: 64 nodes, each with 8 processors, 1 hour Submit to batch queue Merge stdout and stderr outputRedirect output to a file
Change work dir to current dirPrint allocated nodes (not required)
Run the mpi program
Extra Help For Accessing Pere
• Contact me.• User’s guide for pere
Using Condor
• Resources:– http://www.cs.wisc.edu/condor/tutorials/
Using Condor
Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.out Error = simple.error Queue
1. Write a submit script – simple.job
2. Submit the script to condor pool condor_submit simple.job
3. Watch the job run condor_q condor_q –sub <you-username>
Doing a Parameter Sweep
Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.$(Process).out Error = simple.$(Process).error Queue
Arguments = 4 11 Queue
Arguments = 4 12 Queue
Can put a collections of jobs in the same submit scripts to do a parameter sweep.
Tell condor to use different output for each job
Use queue to tell the individual jobs Can be run independently
Condor DAGManDAGMAn, lets you submit complex sequences of jobs as long as they can be expressed as a directed acylic graph
Commands: condor_submit_dag simple.dag./watch_condor_q
Each job in the DAG can only one queue.
Submit MPI Jobs to Condor
Difference from serial jobs: use MPI universe machine_count > 1
When there is no shared file system, transfer executables and output from/to local systems by specifying should_transfer_file and when_to_transfer_output
Questions
• How to implement parameter sweep using SGE/PBS?
• How to implement DAG on SGE/PBS?• Is there better ways to run the a large number
of jobs on the cluster?• Which resource I should use and where I can
find help?
HPCL Cluster
Head Node
Compute Node #1
Compute Node #2
Compute Node #3
Compute Node #4Giga
bit E
ther
net I
nter
conn
ectio
n
To MARQNET134.48.
10..1
.0.0
/16
How to Access HPCL ClusterOn Windows: Using SSH Secure Shell or PUTTY
On Linux: Using ssh command
Developing & Running Parallel Code Identify Problem & Analyze
Requirement
Analyze Performance Bottleneck
Designing Parallel Algorithm
Writing Parallel Code
Building Binary Code (Compiling)
Testing Code
Solving Realistic Problems(Running Production Release)
Coding
Compiling
Running
Steps to Run A Parallel Code1. Get the source code
– You can do it either on your local computer and then transfer to hpcl.mscs.mu.edu, or
– Use vi to edit a new one on hpcl.mscs.mu.edu2. Compile your source code using
mpicc, mpicxx or mpif77They are located under /opt/openmpi/bin. Use which command to find it
location; If not in your path, add the next line to your shell initialization file (e.g.,
~/.bash_profile)export PATH=/opt/openmpi/bin:$PATH
3. Write a submission script for your job– vi myscript.sh
4. Use qsub to submit the script.– qsub myscript.sh
Getting Parallel Codehello.c
You can write the code on your development machine using IDE and then transfer the code to the cluster. (Recommended)
For small code, you can also directly edit it on the cluster.
Transfer File to Cluster• Method 1: sftp (text or GUI)
sftp [email protected] simple.cbye
• Method 2: scpscp simple.c [email protected]:example/
• Method 3: rsyncrsync -rsh=ssh -av example \ [email protected]:
• Method 4: svn or cvssvn co \ svn+ssh://hpcl.mscs.mu.edu/mscs6060/example
Compile MPI Programs
• Method 1: Using MPI compiler wrappers– mpicc: for c code– mpicxx/mpic++/mpiCC: for c++ code– mpif77, mpif90: for FORTRAN codeExamples:
mpicc –o hello hello.c mpif90 –o hello hello.fLooking the cluster documentation or consulting system
administrators for the types of available compilers and their locations.
Compile MPI Programs (cont.)
• Method 2: Using standard compilers with mpi library
Note: MPI is just a library, so you can link the library to your code to get the executables.
Examples: gcc -o ping ping.c \ -I/usr/mpi/gcc/openmpi-1.2.8/include \ -L/usr/mpi/gcc/openmpi-1.2.8/lib64 -lmpi
Compiling Parallel Code – Using Makefile
Job Scheduler• A kind of software that
provide– Job submission and
automatic execution– Job monitoring and control– Resource management– Priority management– Checkpoint– ….
• Usually implemented as master/slave architecture
Commonly used Job Schedulers PBS: PBS Pro/TORQUE SGE (Sun Grid Engine, Oracle) LSF (Platform Computing) Condor (UW Madison)
Using SGE to Manage Jobs
• HPCL cluster using SGE as job scheduler• Basic commands– qsub submit a job to the batch scheduler – qstat examine the job queue – qdel delete a job from the queue
• Other commands– qconf SGE queue configuration– qmon graphical user's interface for SGE– qhost show the status of SGE hosts, queues, jobs
Submit a Serial Jobsimple.sh
Submit Parallel Jobs to HPCL Cluster
force to use bash for shell interpreterRequest Parallel Environment orte using 64 slots (or processors)Run the job in specified directorMerge two output files (stdout, stderr)Redirect output to a log file
Run mpi program
For your program, you may need to change the processor number, the program name at the last line, and the job names.
References
• SUN Grid Engine User’s Guide http://docs.sun.com/app/docs/doc/817-6117
• Command used commands– Submit job: qsub– Check status: qstat– Delete job: qdel– Check configuration: qconf
• Check the manual of a command– man qsub