43
Before We Start • Sign in • hpcXX account slips • Laptops if you need them (Mac OS only)

Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Before  We  Start  •  Sign  in  •  hpcXX  account  slips  •  Laptops  if  you  need  them  (Mac  OS  only)  

 

Page 2: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Research  Compu@ng  at  Virginia  Tech  Advanced Research Computing

Page 3: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Compute  Resources  

Page 4: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Blue  Ridge  Compu@ng  Cluster    •  Resources  for  running  jobs  

–  408  dual-­‐socket  nodes  with  16  cores/node  –  Intel  Sandy  Bridge-­‐EP  Xeon  (8  cores)  in  each  socket  –  4  GB/core,    64  GB/node  –  total:  6,528  cores,  27.3  TB  memory  

•  Special  nodes:  –  130  nodes  with  2  Intel  Xeon  Phi  accelerators  –  18  nodes  with  128  GB  –  4  nodes  with  2  Nvidia  K40  GPUs  

•  Quad-­‐data-­‐rate  (QDR)  InfiniBand  

Page 5: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

HokieSpeed  –  CPU/GPU  Cluster  •  206  nodes,  each  with:  

–  Two  6-­‐core  2.40-­‐gigahertz  Intel  Xeon  E5645  CPUs  and  24  GB  of  RAM  

–  Two  NVIDIA  M2050  Fermi  GPUs  (448  cores/socket)  •  Total:  2,472  CPU  cores,    412  GPUs,  5  TB  of  RAM    •  Top500  #221,  Green500  #43  (November  2012)  •  14-­‐foot  by  4-­‐foot  3D  visualiza@on  wall  •  Recommended  Uses:    

– Large-­‐scale  GPU  compu@ng  – Visualiza@on  

Page 6: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

HokieOne  -­‐  SGI  UV  SMP  System  •  492  Intel  Xeon  7542  (2.66GHz)  cores  

–  Sockets:  Six-­‐core  Intel  Xeon  X7542  (Westmere)  –  41  dual-­‐socket  blades  for  computa@on  –  One  blade  for  system  +  login  

•  2.6TB  of  Shared  Memory  (NUMA)  –   64  GB/blade,  blades  connected  with  NUMAlink  

•  Resources  scheduled  by  blade  (6  cores,  32  GB)  •  Recommended  Uses:  

– Memory-­‐heavy  applica@ons  (up  to  1  TB)  –  Shared-­‐memory  (e.g.  OpenMP)  applica@ons  

Page 7: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Ithaca  –  IBM  iDataPlex  •  84  dual-­‐socket  quad-­‐core  Intel  Nehalem  2.26  GHz  nodes  (672  cores  in  all)  –  66  nodes  available  for  general  use  

•  Memory  (2  TB  Total):    –  56  nodes  have  24  GB  (3  GB/core)  –  10  nodes  have  48  GB  (6  GB/core)  

•  Quad-­‐data-­‐rate  (QDR)  InfiniBand  •  Opera@ng  System:  CentOS  6  •  Recommended  uses:    

–  Parallel  Matlab  (28  nodes/224  cores)  –  Beginning  users  

Page 8: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Preview: Hardware Arriving Summer 2015

Node  Descrip<on Quan<ty CPU Memory Local  Disk Other  

Features Network

General 100 2  x  E5-­‐2680  v3  (Haswell,  24  cores) 128  GB 1.8  TB EDR  IB

Large  Direct  Ajached  Storage

16 2  x  E5-­‐2680  v3  (Haswell,  24  cores) 512  GB 43.2  TB    

(24  x  1.8  TB) 2x  200  GB  

SSD EDR  IB

GPU 8 2  x  E5-­‐2680  v3  (Haswell,  24  cores) 512  GB 3.6  TB    

(2  x  1.8  TB) NVIDIA  K80  

GPU EDR  IB

Large  Memory 2 4  x  E7-­‐4890  v2  (Ivy  

Bridge,  60  cores) 3  TB 10.8  TB    (6  x  1.8  TB) EDR  IB

Page 9: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Storage  Resources  

Page 10: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Name   Intent   File  System  Environment  Variable  

Per  User  Maximum  

Data  Lifespan  

Available  On  

Home  Long-­‐term  storage  of  

files  NFS   $HOME   100  GB   Unlimited  

Login  and  Compute  Nodes  

Work  Fast  I/O,  

Temporary  storage  

Lustre  (BlueRidge)  GPFS  (Other  clusters)  

$WORK  14  TB,  

3  million  files  120  days  

Login  and  Compute  Nodes  

Archive  

Long-­‐term  storage  for  infrequently-­‐accessed  files  

CXFS   $ARCHIVE   -­‐   Unlimited   Login  Nodes  

Local  Scratch  Local  disk  

(hard  drives)  $TMPDIR  

Size  of  node  hard  drive  

Length  of  Job  Compute  Nodes  

Memory  (tmpfs)  

Very  fast  I/O  Memory  (RAM)  

$TMPFS  Size  of  node  memory  

Length  of  Job  Compute  Nodes  

Old  Home  Access  to  legacy  files  

NFS   -­‐   Read-­‐only   TBD   Login  Nodes  

Page 11: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

GETTING  STARTED  ON  ARC’S  SYSTEMS  

Page 12: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Gemng  Started  Steps  

1.  Apply  for  an  account  2.  Log  in  (SSH)  into  the  system  3.  System  examples  

a.  Compile  b.  Test  (interac@ve  job)  c.  Submit  to  scheduler  

4.  Compile  and  submit  your  own  programs  

Page 13: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

ARC  Accounts  

1.  Review  ARC’s  system  specifica@ons  and  choose  the  right  system(s)  for  you  a.  Specialty  sooware  

2.  Apply  for  an  account  online  3.  When  your  account  is  ready,  you  will  receive  

confirma@on  from  ARC’s  system  administrators  within  a  few  days  

Page 14: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Log  In  

•  Log  in  via  SSH  – Mac/Linux  have  built-­‐in  client  – Windows  need  to  download  client  (e.g.  PuTTY)  

   

System   Login  Address  (xxx.arc.vt.edu)  

BlueRidge   blueridge1  or  blueridge2  

HokieSpeed   hokiespeed1  or  hokiespeed2  

HokieOne   hokieone  

Ithaca   ithaca1  or  ithaca2  

Page 15: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

BLUERIDGE  ALLOCATION  SYSTEM  

Page 16: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Blue  Ridge  Alloca@on  System  

Goals  of  the  alloca@on  system:  •  Ensure  that  the  needs  of  computa@onal  intensive  research  projects  are  met  

•  Document  hardware  and  sooware  requirements  for  individual  research  groups  

•  Facilitate  tracking  of  research    hjp://www.arc.vt.edu/userinfo/[email protected]    

Page 17: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Alloca@on  Eligibility  

To  qualify  for  an  alloca@on,  you  must  meet  at  least  one  of  the  following:  •  Be  a  Ph.D.  level  researcher  (post-­‐docs  qualify)  •  Be  an  employee  of  Virginia  Tech  and  the  PI  for  research  compu@ng    

•  Be  an  employee  of  Virginia  Tech  and  the  co-­‐PI  for  a  research  project  led  by  non-­‐VT  PI  

Page 18: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Alloca@on  Applica@on  Process  

1.  Create  a  research  project  in  ARC  database  2.  Add  grants  and  publica@ons  associated  with  

project  3.  Create  an  alloca@on  request  using  the  web-­‐

based  interface  4.  Alloca@on  review  may  take  several  days  5.  Users  may  be  added  to  run  jobs  against  your  

alloca@on  once  it  has  been  approved  

Page 19: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Alloca@on  Tiers  

Research  alloca@ons  fall  into  three  @ers:  •  Less  than  200,000  system  units  (SUs)  

–  200  word  abstract  •  200,000  to  1  million  SUs  

–  1-­‐2  page  jus@fica@on  • More  than  1  million  SUs  

–  3-­‐5  page  jus@fica@on  

Page 20: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Alloca@on  Management  

•  Users  can  be  added  to  the  project:  hjps://portal.arc.vt.edu/am/research/my_research.php  

• glsaccount:  Alloca@on  name  and  membership  

•  gbalance -h -a <name>:  Alloca@on  size  and  amount  remaining  

•  gstatement -h -a <name>:  Usage  (by  job)  

Page 21: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

USER  ENVIRONMENT  

Page 22: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Environment  •  Consistent  user  environment  in  systems    

– Using  modules  environment  – Hierarchical  module  tree  for  system  tools  and  applica@ons  

Page 23: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Modules  •  Modules  are  used  to  set  the  PATH  and  other  environment  

variables  •  Modules  provide  the  environment  for  building  and  running  

applica@ons  –  Mul@ple  compiler  vendors  (Intel  vs  GCC)  and  versions  –  Mul@ple  sooware  stacks:  MPI  implementa@ons  and  versions  –  Mul@ple  applica@ons  and  their  versions  

•  An  applica@on  is  built  with  a  certain  compiler  and  a  certain  sooware  stack  (MPI,  CUDA)    –  Modules  for  sooware  stack,  compiler,  applica@ons  

•  User  loads  the  modules  associated  with  an  applica@on,  compiler,  or  sooware  stack    –  modules  can  be  loaded  in  job  scripts  

Page 24: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Modules  • Modules  are  used  to  set  up  your  PATH  and  other  environment  variables  

       % module {lists options}

% module list {lists loaded modules}

% module avail {lists available modules}

% module load <module> {add a module}

% module unload <module> {remove a module}

% module swap <mod1> <mod2> {swap two modules}

% module help <mod1> {module-specific help}

Page 25: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Module  commands  

module list options

module list list loaded modules

module avail list available modules

module load <module> add a module

module unload <module> remove a module

module swap <mod1> <mod2> swap two modules

module help <module> module environment

module show <module> module description

module reset reset to default

module purge unload all module

Page 26: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Modules  

•  Available  modules  depend  on:  –  The  compiler  (eg.  Intel,  gcc)  and  –  The  MPI  stack  selected  

 •  Defaults:  

BlueRidge:  Intel  +  mvapich2  – HokieOne:  Intel  +  MPT    – HokieSpeed,  Ithaca:  Intel  +  OpenMPI  

Page 27: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Hierarchical  Module  Structure  

Page 28: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Modules  

•  The  default  modules  are  provided  for  minimum  func@onality.  

•  Module  dependencies  against  choice  of  compiler  and  MPI  stack  are  automa@cally  taken  care  of.  

module  purge  

module  load  <compiler>  

module  load  <mpi  stack>  

module  load  <high  level  sooware,  i.e.  PETSc>  

Page 29: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

JOB  SUBMISSION  &  MONITORING  

Page 30: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Job    Submission  •  Submission  via  a  shell  script  

–  Job  descrip@on:  Nodes,  processes,  run  @me  – Modules  &  dependencies  –  Execu@on  statements  

•  Submit  job  script  via  the  qsub  command:  qsub <job_script>

   

Page 31: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Batch  Submission  Process  

Queue:  Job  script  waits  for  resources.  Master:  Compute  node  that  executes  the  job    

 script,  launches  all  MPI  processes.    

Page 32: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Job    Monitoring  •  Determine  job  status,  and  if  pending  when  it  will  run  

Command   Meaning  

checkjob  –v  JOBID   Get  the  status  and  resource  of  a  job  

qstat  –f  JOBID   Get  status  of  a  job  

showstart  JOBID   Get  expected  job  start  @me  

qdel  JOBID   Delete  a  job  

mdiag  -­‐n   Show  status  of  cluster  nodes  

Page 33: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Job  Execu@on  

•  Order  of  job  execu@on  depends  on  a  variety  of  parameters:  –  Submission  Time  – Queue  Priority  –  Backfill  Opportuni@es  –  Fairshare  Priority  – Advanced  Reserva@ons  – Number  of  Ac@vely  Scheduled  Jobs  per  User  

Page 34: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Examples:  ARC  Website  •  See the Examples section of each system page for

sample submission scripts and step-by-step examples: – hjp://www.arc.vt.edu/resources/hpc/blueridge.php  – hjp://www.arc.vt.edu/resources/hpc/hokiespeed.php  – hjp://www.arc.vt.edu/resources/hpc/hokieone.php  – hjp://www.arc.vt.edu/resources/hpc/ithaca.php  

Page 35: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Gemng  Started  

•  Find  your  training  account  (hpcXX)  •  Log  into  BlueRidge  

– Mac:  ssh [email protected] – Windows:  Use  PuTTY  

• hjp://www.chiark.greenend.org.uk/~sgtatham/pujy/  • Host  Name:  ithaca2.arc.vt.edu  

Page 36: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Example:  Running  MPI_Quad  

•  Source  file:  hjp://www.arc.vt.edu/resources/sooware/mpi/docs/mpi_quad.c  

 •  Copy  the  file  to  Ithaca  

– wget  command  –  Could  also  use  scp  or  sop  

 •  Build  the  code      

Page 37: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Compile  the  Code  

•  Intel  compiler  is  already  loaded    module list

•  Compile  command  (executable  is  mpiqd)    mpicc -o mpiqd mpi_quad.c

•  To  use  GCC  instead,  swap  it  out:  module swap intel gcc

Page 38: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Prepare  Submission  Script  

1.  Copy  sample  script:    cp /home/TRAINING/ARC_Intro/it.qsub .

1.  Edit  sample  script:  a.  Wall@me  b.  Resource  request  (nodes/ppn)  c.  Module  commands  (add  Intel  &  mvapich2)  d.  Command  to  run  your  job  

2.  Save  it  (e.g.,  mpiqd.qsub)  

Page 39: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Submission  Script  (Typical)  #!/bin/bash #PBS -l walltime=00:10:00 #PBS -l nodes=2:ppn=8 #PBS -q normal_q #PBS -W group_list=ithaca #PBS –A AllocationName <--Only for BlueRidge module load intel mvapich2 cd $PBS_O_WORKDIR echo "MPI Quadrature!" mpirun -np $PBS_NP ./mpiqd exit;

Page 40: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Submission  Script  (Today)  #!/bin/bash #PBS -l walltime=00:10:00 #PBS -l nodes=2:ppn=8 #PBS -q normal_q #PBS -W group_list=training #PBS –A training #PBS –l advres=NLI_ARC_Intro.11 module load intel mvapich2 cd $PBS_O_WORKDIR echo "MPI Quadrature!" mpirun -np $PBS_NP ./mpiqd exit;

Page 41: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Submit  the  job  

1.  Copy  the  files  to  $WORK:  cp mpiqd $WORK cp mpiqd.qsub $WORK

2.  Navigate  to  $WORK  cd $WORK

3.  Submit  the  job:  qsub mpiqd.qsub

4.  Scheduler  returns  job  number:    25770.master.cluster

Page 42: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Wait  for  job  to  complete  

1.  Check  job  status:  qstat –f 25770 or qstat –u hpcXX checkjob –v 25770

2.  When  complete:  1.  Job  output:  mpiqd.qsub.o25770 2.  Errors:  mpiqd.qsub.e25770

3.  Copy  results  back  to  $HOME:  cp mpiqd.qsub.o25770 $HOME

Page 43: Before&We&Start · 2017-01-28 · Preview: Hardware Arriving Summer 2015 Node& Descripon Quan

Resources  

•  ARC  Website:  hjp://www.arc.vt.edu  •  ARC  Compute  Resources  &  Documenta@on:  hjp://www.arc.vt.edu/resources/hpc/  

•  New  Users  Guide:  hjp://www.arc.vt.edu/userinfo/newusers.php  

•  Frequently  Asked  Ques@ons:  hjp://www.arc.vt.edu/userinfo/faq.php  

•  Unix  Introduc@on:  hjp://www.arc.vt.edu/resources/sooware/unix/