Upload
amir-abdella
View
223
Download
0
Embed Size (px)
Citation preview
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 1/22
Job Management Systems
SGEv1.3 Author: Anand Vaidya
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 2/22
Why use SGE?Maintain order in a shared resource li!e "ueing u#
at a movie tic!et counter rather than mobbing thecounter
A##ly di$$erent usage #olicies %h&s and %ro$s getbetter treatment than $irst year grads
Everyone gets a $air share o$ the com#utingresource.
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 3/22
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 4/22
How does SGE work?
Users submit jobs to the Grid Engine.Unless resources are immediately availablenon-interactive jobs are kept in ueuesuntil resources to e!ecute them becomeavailable.
"obs are passed onto the availablee!ecution hosts
#ecords of each jobs progress through thesystem are kept and reported whenre uested.
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 5/22
SGE Components$osts
%aster &coordinate activities' hold ueues(E!ecution &workers()dministration &sets up system' ueues etc(Submit &users can submit jobs from these(
Usually the master and admin host are the samemachines
*ueues &de+ned by the administrator(User and )dministrator ,ommands
aemons sge/ master &%aster aemon('sge/schedd &Scheduler aemon(' sge/e!ecd&E!ecution aemon( and sge/commd&,ommunication aemon(
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 6/22
SGE Commands - qhost'hat is the state o$ the cluster( )o* many nodes+
ty#e+ load( 'hat is my chance o$ getting a node(,root@shar! - / "host
)0S 2AME A 4) 24%5 60A& MEM 0 MEM5SE S'A% 0 S'A%5S
777777777777777777777777777777777777777777777777777777777777777777777777777777
global 7 7 7 7 7 7 7shar!7c88 l9 ;7amd<; .8 3.=G ;8.>M ;.8G 8.8
shar!7c8 l9 ;7amd<; .88 3.=G 1;.=M ;.8G 8.8
shar!7c83 l9 ;7amd<; 1.?< 3.=G 1 .=M ;.8G 8.8
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 7/22
SGE Commands - qsub4reate a obscri#ts Bmy ob.shC
Submit $or e9ecutionD "sub my ob.sh
our ob ?; BFmy ob.shFC has been submitted.
Sim#lest Job:,vaidya@shar! - D cat my ob.sh
/ HbinHsh
slee# 18date I Htm#Htest1.out.t9t
Variations: "sub 7c*d my ob.sh
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 8/22
&,( )nand 0aidya anand1novaglobal.com.sg
SGE Commands - qstatchec! status o$ your ob:
"stat "stat 7$ "stat 7u username "stat 7 obKid
,root@shar! - / "stat ob7L& #rior name user state submitHstart at "ueueslots a7tas!7L&7777777777777777777777777777777777777777777777777777777777777777777777777777777 <3= 8. 88 )4%&LV? test1 r 8 H1?H 88< 18:1<:31 all."@shar!7c88
1 < > 8. 88 )4%&LV1 test1 r 8 H1?H 88< 13:3?:3 all."@shar!7c88
1
<=; 8. 88 44&VL test1 r 8 H1?H 88< 3: :1= all."@shar!7c81 <= 8. 88 44&VL1 test1 r 8 H1?H 88< 3: :1= all."@shar!7c8
1
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 9/22
SGE Commands - qstatStatus o$ the ob is indicated by letters as:
"* 7 *aiting t 7 trans$eringr 7 running s+S 7 sus#ended7 restarted 7 threshold
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 10/22
SGE Commands - qdel&elete your ob+ i$ you *ish
"del ?;3vaidya has deleted ob ?;3
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 11/22
SGE Commands - qmon"mon is a N'indo*s G5L tool to
submitHdeleteHvie* obs+ con$igure SGE systemE9am#le: Submit a ob using "mon
4lic! the Job Submission icon.4lic! the Job Scri#t $ile selection icon to o#en a $ile selection bo9
and select your scri#t $ile. hen+ clic! 0O.4lic! the Submit button at the bottom o$ the Job Submission
dialog. A$ter a cou#le o$ seconds+ you should be able to monitor your
ob in the Job 4ontrol dialog. 4lic! the Job 4ontrol icon in thePM02 control #anel.
ou $irst see it under %ending Jobs+ and it "uic!ly moves tounning Jobs a$ter it gets started.
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 12/22
SGE Commands – qsh, qtcshSubmit a Lnteractive session re"uest:
"login"rshEnsure you have a valid NServer running on your
des!to#. Allo* remote 9clients to dis#lay on yourdes!to#.Submit an Lnteractive session re"uest:
"sh"tcsh
2ote: using this $eature needs additional con$iguration+ maynot *or! other*ise.
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 13/22
SGE Commands – obscriptsam#le ob scri#t:
/ HbinHbash
/
/D 7c*d
/D 7 y
/D 7S HbinHbash
/D 7V
date
slee# 18
env
date
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 14/22
SGE Commands – obscriptsam#le ob scri#t:
/ HbinHbash
/
/D 7c*d
/D 7 y
/D 7S HbinHbash
/
DM%LK&L Hm#irun 7n# D2S60 S 7machine$ile
D M%&L Hmachines my#arallel#rog.e9e Qin$ile.t9t out$ile.t9tR
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 15/22
SGE Commands – obscript7c*d change to current dir be$ore running ob
7 y merge error *ith stdout
7r y code is re7runnable
72 name set the ob name
7l hKrt 88:38:88 run ob $or ma9 o$ 38mins
7#e m#ich Lnvo!e #arallel environment
7#e m#ich7ib use in$iniband #arallel environment
7#e m#ich7eth use ethernet #arallel env
7V carry all env variable settings
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 16/22
!dmin Commands2e9t $e* slides sho* commands use$ul $or SGE
admins Bnot usersHresearchersC
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 17/22
SGE Commands – qcon" Sho*:
com#le9es: "con$ 7sc"ueues: "con$ 7s"l%E: "con$ 7s#le9ec host: "con$ 7sel "con$ 7se c3submit hosts: "con$ 7ssadmin hosts: "con$ 7shlist calendars "con$ 7scallcon$iguration "con$ 7scon$ user list: "con$ 7suserlScheduler con$: "con$ 7sscon$
SGE C d i #
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 18/22
SGE Commands – qpin#[anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1
05/24/200 21!57!34!
"# % &ersion! 0'1"# % (essage id! 1
s)ar) )i(e! 05/24/200 21!31!37*114+4774,7
r.n )i(e [s]! 17 +(essages in read .ffer! 0
(essages in ri)e .ffer! 0
nr' of connec)ed c ien)s! 2
s)a).s! 0
info! dispa)cher! *0'04
%oni)or! disa ed
$S% C d
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 19/22
$S% Commandsbsub – submit a jobbsto# sus#end a obbresume resume a sus#ended tas!bto# move ob to to#bs*itch move obs bet*een "ueues
lsgrun run a tas! on a set o$ hostsb!ill !ill a ob
$S% C d
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 20/22
$S% Commandslsmon – monitor load, resource
availability...lsid sho* ls$ details Bversion etcClshosts sho* hosts T static in$olsload sho* load in$o $or hosts
lsin$o sho* ls$ con$ig in$obusers sho* user in$obacct sho* acct in$o on $inished obsb obs sho* in$o on obs
b#ee! sho* stdinHstdout o$ un$inished obs
! k l d# & C i #
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 21/22
!cknowled#ements & Copyin#his material is based on my e9#erience as *ell as materialcollected $rom SGE documentation.
his #resentation can be redistributed as $ollo*s:2o commercial re7distribution: eg+ as #art o$ a $or7#ro$it 4& 0M
or as #art o$ your sales #itch. See! my #ermission $irst.Must attribute the document creator.Share ali!e: L$ you use this document and enhance it or modi$y+
share the modi$ications or the modi$ied document'hich means L a##ly: 4reative 4ommons 6icense+
htt#:HHcreativecommons.orgHlicensesHby7nc7saH . H
'h E d
8/12/2019 Linux Cluster Job Management Systems Sge2197
http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 22/22
'he Endhan!s $or your time. L$ you have any $eedbac!+ corrections or"uestions #lease contact me: Anand Vaidya+
his document *as created *ith 0#en0$$ice on 6inu9. email me i$you *ant the od# $ile instead o$ the #d$