22
CSIRO ESG WORKSHOP September 20/21 2011 Ben Evans Joseph Antony Muhammad Atif Margaret Kahn

CSIRO ESG WORKSHOP

  • Upload
    donald

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

CSIRO ESG WORKSHOP. September 20/21 2011 Ben Evans Joseph Antony Muhammad Atif Margaret Kahn. Accessing the dcc. Logging on the dcc ssh -Y -l aaa777 dcc.nci.org.au use putty from Windows use VNC for X forwarding Web page http://nf.nci.org.au User Guide - PowerPoint PPT Presentation

Citation preview

Page 1: CSIRO  ESG WORKSHOP

CSIRO ESG WORKSHOP

September 20/21 2011Ben Evans

Joseph AntonyMuhammad Atif

Margaret Kahn

Page 2: CSIRO  ESG WORKSHOP

Accessing the dcc

Logging on the dcc

• ssh -Y -l aaa777 dcc.nci.org.au• use putty from Windows• use VNC for X forwarding

Web pagehttp://nf.nci.org.auUser Guidehttp://nf.nci.org.au/facilities/userguideFAQhttp://nf.nci.org.au/facilities/faqSoftware web pagehttp://nf.nci.org.au/facilities/software

[email protected]

Page 3: CSIRO  ESG WORKSHOP

Applying for accounts

• National Merit Scheme (MAS)• Partner allocations• Startup allocation• Flagship projects

Distribution of compute allocation across VU and XE:• MAS 29% (Flagship projects 10%)• ANU 24%• CSIRO 24%• Director’s share 5%• INTERSECT 4%• Monash E-Research .5%• Geoscience Australia .4%• iVec .4%• QCIF .4%

Page 4: CSIRO  ESG WORKSHOP

Project code

• To access the ESG data you need to be in the group ua6.• This has no compute time.• For computation you will need to be connected to a compute grant, e.g. r87• r87 is a CSIRO grant• If you are new to the NCI NF first fill out the registration form at http://nf.nci.org.au• Then fill out the form to connect to an existing project.• This has to be approved by the Lead Chief Investigator of the project. • You will be notified by email when your account is set up and the password sent as an SMS if you have provided a mobile number.

Page 5: CSIRO  ESG WORKSHOP

Project accounting

• Time allocated per quarter.• No transfer between quarters.• Chose project for accounting when you login.• Change default project in the .rashrc file.• PROJECT environment variable.• quotasu -P project -h displays the usage in the current quarter and some recent history. • dcc usage being tracked but not included in overall usage yet.• dcc will be charged at .7 SU (service unit).

Page 6: CSIRO  ESG WORKSHOP

Unix Environment

The working environment under UNIX is controlled by shells (command-line interpreter). The shell interprets the commands the user types in and carries them out.

• Default is tcsh shell (also popular is bash)• Shell can be changed by modifying .rashrc• Shell commands can be grouped together into scripts

The shell provides environment variables that can be accessed across all the processes initiated from the original shell e.g. login environment

- .cshrc and .login (csh/tcsh) - .bashrc and .profile (sh/bash)

tcsh syntax setenv VARIABLE valuebash syntax export VARIABLE=value

Page 7: CSIRO  ESG WORKSHOP

Environment Modules

Modules provide a great way to easily customize your shell environment, especially on the fly. The module command syntax is the same no matter which command shell you are using.

Various modules are loaded into your environment at login to provide a workable environment.

module list # To see the modules loaded module avail # To see the list of software for which environments have been set up via modules module load # To load the environment settings required by a software package module unload # To remove extras added to the environment for a previously loaded software package. This is extremely useful in situations where different package settings clash.

Note:To automate environment customisation at login module load commands can be added to the .login (tcsh) or .profile (bash) files. However, BEWARE, different applications can have incompatible environment requirements so loading multiple application modules in your dot file is likely to lead to problems. We recommend that modules are loaded in scripts as needed at runtime and modules loaded in the dot files kept to a minimum.

Page 8: CSIRO  ESG WORKSHOP

Editors

Several editors are available

• vi• emacs• nano • nedit

If you are not familiar with any of these you will

find that nano has a simple interface. Just type nano.

Page 9: CSIRO  ESG WORKSHOP

Exercise: Getting started

Logging on to the dcc - for example for course account aaa777.

The project code is c23

ssh -Y dcc.nci.org.au -l aaa777

Remember to read the Message of the Day (MOTD) as you login.

Commands to try:

hostname # to see the node you are logged into quotasu # to see the current state of the project printenv # to look at your environment settings module list # to check which modules are loaded on login module avail # to see which software packages are installed and accessible in this way.

Load the Intel compilers

module load intel-fc intel-cc

Look at the environment settings again to see how they've changed.

Page 10: CSIRO  ESG WORKSHOP

Filesystems

The Filesystems section of the userguide has this table in greater detail:

Filesystem Size Limit Backup Location Time Limit/home 1000MB default Yes Global No/short 80GB default per project No Global 120 days /jobfs 20GB per cpu default No Local to node Duration of jobMDSS 20GB 2 copies External, access No using special commands

Note that these limits can be changed on request if necessary.

dcc also has /projects – for persistent data

Page 11: CSIRO  ESG WORKSHOP

Batch Queueing System

Most work done as batch jobs (interactive process limits are small).

Queueing system:• distributes work evenly over the system• ensures that jobs cannot impact each other (e.g. exhaust memory or other resources)• provide equitable access to the system

NF uses a modified version of OpenPBS

Scheduling policy is discussed in the User Guide.

Page 12: CSIRO  ESG WORKSHOP

Batch Queue Structure

normal Default queue designed for production use Charging rate of 1 SU per processor-hour (walltime) Largest allowed resources If your grant is exhausted you still get access at a lower priorityexpress High priority for testing, debugging etc. Charging rate of 3 SUs per processor-hour (walltime) Smaller limits to discourage "production use" by projects with too much grant leftcopyq Used for file manipulation - e.g. copying files to MDSS Only queue to run on the file server node for /short

Job charging is based on wall clock time used, number of cpus requested, queue choice and machine choice.

One hour of time on the DCC is worth 70% of one hour on the VU.

Page 13: CSIRO  ESG WORKSHOP

Using the Queueing System

Read the "PBS Batch Use" and "Queues and Scheduling" sections of the Userguide See: nf_limits Request resources for your job (using qsub). See man pbs_resources:• walltime• (v)memory• disk (jobfs)• number of cpus• software

PBS will then• schedule the job when the resources become available• prevent other jobs from infringing on the allocated resources• if necessary delay starting job until software licence is available• display progress of the jobs (nqstat)• terminate the job when it exceeds its requested resources• return stdout and stderr in batch output files

Page 14: CSIRO  ESG WORKSHOP

PBS Resources

Example batch script

#!/bin/csh #PBS -l walltime=01:00:00 #PBS -l vmem=1Gb #PBS -l jobfs=5Gb #PBS -l ncpus=1 #PBS -wd module load ncl/6.0 ncl prog.ncl

Page 15: CSIRO  ESG WORKSHOP

Scheduling and Job Suspension

Jobs won't be started until sufficient resources are free.

Resources allocated to a job are unavailable to other jobs.

Jobs can be suspended to run parallel jobs but the fraction of time suspended is limited (depends on how many jobs you have running, number of cpus, etc.)

Only ask for the resources your job really needs!

Avoids your job being delayed in the queue or suspended unnecessarily Avoids other users jobs being delayed unnecessarily by wasted resources Experiment in express and look at the bottom of the PBS stdout file to see what resources were used by jobs

Page 16: CSIRO  ESG WORKSHOP

Stdout and stderr files

PBS queueing system returns the standard output and standard error arisingfrom the script in .o and .e files respectively..o file contains the output arising from the script (if not redirected in thescript) and additional information from PBS.cat batchscript.o70450

Warning: no access to tty (Bad file descriptor).Thus no job control in this shell.================================================================= Resource usage: CPU time: 00:00:04 JobId: 70450.vu-pbs Elapsed time: 00:00:06 Project: c23 Requested time: 00:10:00 Service Units: 0.01 Max physical memory: 7MB Max virtual memory: 8MB Requested memory: 50MB

Max jobfs disk use: 0.0GB Requested jobfs: 0.1GB================================================================

Page 17: CSIRO  ESG WORKSHOP

Stdout and stderr

.e file contains any error output arising from the script (if not redirectedin the script) and additional information from PBS. For a successful jobit should be empty.

Common errors to look for in the .e file:• Command not found.• =>> PBS: job terminated: walltime 172818sec exceeded limit 172800sec• =>> PBS: job terminated: per node vmem 2227620kb exceeded limit 2097152kb• Segmentation fault.

man qsub man nf_limits man qdel nf_limits

Page 18: CSIRO  ESG WORKSHOP

Using the Mass Data Store

MDSS is used for long term storage of large datasets.If you have numerous small files to archive - bundle into a tar file FIRST. Every project has a directory on the MDSS at /massdata/$PROJECT

All members of the project group have read and write access to the top project directory. The mdss command can be used to "get" and "put" data between the interactive nodes of the vu or xe and the MDSS, as well as to list files and directories on the MDSS. netcp and netmv can be used from within batch jobs to

Generate a batch script for copying/moving files to the MDSS Submit the generated batch script to the special copyq which runs copy/move job on an interactive node.

netcp and netmv can also be used interactively to save you work creating tar files and generating mdss commands.

Page 19: CSIRO  ESG WORKSHOP

Compilers

Both GNU and Intel compilers are available. Note that the Intel C/C++ compiler is compatible with gcc/g++.

module listmodule avail intel-fcmodule avail intel-ccmodule avail gccwhich gcc

Note that there is /usr/bin/gcc but more recent versions of gcc available if you load the relevant module.

In general Python applications build best with gcc so you would do

echo $CCmodule rm intel-fc intel-ccecho $CC

Page 20: CSIRO  ESG WORKSHOP

NCL Example

NCL exampleNCL example

cp -r /short/c23/NCL_EXAMPLE . cd NCL_EXAMPLE lsmore evans_2.nclmore batchjobqsub batchjobnqstatmore batchjob.o****more batchjob.e*****ghostscript evans.ps

Page 21: CSIRO  ESG WORKSHOP

ESG data Example

ESG data exampleESG data example

cdcp -r /short/c23/ESG_EXAMPLES .cd ESG_EXAMPLES lsmore fig1.nclmore fig2.nclmore batchjobqsub batchjobnqstatqps ******more batchjob.o****more batchjob.e*****ghostscript fig1.psghostscript fig2.ps

Page 22: CSIRO  ESG WORKSHOP

ESG data Example

[mhk900@dcc ~]$ more /short/c23/ESG_EXAMPLES/fig1.ncl

load "all_scripts.ncl"begintf="/projects/ESG/authoritative/IPCC/CMIP5/CSIRO-QCCCE/CSIRO-Mk3-6-0/piControl/mon/atmos/Amon/r1i1p1/v20110518/tas/tas_Amon_CSIRO-Mk3-6-0_piControl_r1i1p1_000101-050012.nc"

tfh=addfile(tf,"r")

t=tfh->tas-273.15lat=tfh->latrad = 4.0*atan(1.0)/180.0clat = doubletofloat(cos(lat*rad))clat!0 = "lat"clat&lat = lat

……….