Compute Compute Cluster Deployment GuideCluster Deployment Guide

Embed Size (px)

Citation preview

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    1/39

    Step-by-Step Guide to Installing,Configuring, and Tuning a High-Performance Compute ClusterWhite Paper

    Published: June 2007

    For the latest information, please see http://www.microsoft.com/windowsserver200/ccs

    http://www.microsoft.com/windowsserver2003/ccshttp://www.microsoft.com/windowsserver2003/ccs
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    2/39

    Contents

    !ntroduction ...............................................................................................................................

    "efore #ou "e$in...................................................................................................................... %Plan #our &luster.................................................................................................................. %

    !nstall #our &luster 'ardware............................................................................................... 7

    &onfi$ure #our &luster 'ardware.........................................................................................(

    )btain *e+uired oftware.....................................................................................................(

    !nstallation, &onfi$uration, and -unin$ teps..........................................................................

    tep : !nstall and &onfi$ure the ervice ode..................................................................

    tep 2: !nstall and &onfi$ure 1 on the ervice ode ....................................................7

    tep : !nstall and &onfi$ure the 'ead ode......................................................................2

    tep : !nstall the &ompute &luster Pac3............................................................................2

    tep 4: 1efine the &luster -opolo$5................................................................................... 2tep %: &reate the &ompute ode !ma$e...........................................................................24

    tep 7: &apture and 1eplo5 !ma$e to &ompute odes......................................................27

    tep (: &onfi$ure and 6ana$e the &luster......................................................................... 2(

    tep : 1eplo5 the &lient 8tilities to &luster 8sers........................................................ .....0

    ppendi9 : -unin$ 5our &luster.............................................................................................2

    ppendi9 ": -roubleshootin$ #our &luster.............................................................................4

    ppendi9 &: &luster &onfi$uration and 1eplo5ment cripts...................................................(

    *elated in3s...........................................................................................................................

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    3/39

    Introduction

    'i$h;performance computin$ is now within reach for man5 businesses b5 clusterin$ industr5;standard servers. -hese clusters can ran$e from a few nodes to hundreds of nodes. !n thepast, wirin$, provisionin$, confi$urin$, monitorin$, and mana$in$ these nodes and providin$

    appropriate, secure user access was a comple9 underta3in$, often re+uirin$ dedicatedsupport and administration resources. 'owever, 6icrosoft

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    4/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    !igure # Supported cluster topology similar to $CS%-deployment topology

    lthou$h ever5 !- environment is different, this $uide can serve as a basis for settin$ up 5ourlar$e;scale compute cluster. !f 5ou need additional $uidance, see the *elated in3s section atthe end of this $uide for more resources.

    $ote-he intended audience for this document is networ3 administrators who have at least two5earsC e9perience with networ3 infrastructure, mana$ement, and confi$uration. -he e9ampledeplo5ment outlined in this document is tar$eted at clusters in e9cess of 00 nodes. lthou$hthe steps discussed here will wor3 for smaller clusters, the5 represent steps modeled on lar$edeplo5ments for enterprise;scale and research;scale clusters.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    5/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 4

    $ote

    -he s3ill level that is re+uired to complete the steps in this document assumes 3nowled$e ofhow to install, confi$ure, and mana$e 6icrosoft Windows erver 200 in an ctive 1irector5environment, and e9perience in addin$ and mana$in$ computers and users within a domain.

    $ote

    -his is Bersion of this document. -o download the latest updated version, visit the 6icrosoftWeb site@http://www.microsoft.com/hpc/A. -he update ma5 contain critical information thatwas not available when this document was published.

    http://www.microsoft.com/hpc/http://www.microsoft.com/hpc/http://www.microsoft.com/hpc/http://www.microsoft.com/hpc/
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    6/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster %

    Before You Begin

    ettin$ up a compute cluster with Windows erver 200 &ompute &luster =dition be$ins withthe followin$ tas3s:

    . Plan 5our cluster.2. !nstall 5our cluster hardware.

    . &onfi$ure 5our cluster hardware.

    . )btain re+uired software.

    When 5ou have completed these tas3s, use the steps in the Installation, Configuration, andTuning Stepssection to help 5ou install, confi$ure, and tune 5our cluster.

    Plan Your Cluster

    -his step;b5;step $uide provides basic instructions on how to deplo5 a Windows computecluster. #our cluster plannin$ should cover the t5pes of nodes that are re+uired for a cluster,and the networ3s that 5ou will use to connect the nodes. lthou$h the instructions in this$uide are based on one specific deplo5ment, 5ou should also consider 5our environment andthe number and t5pes of hardware 5ou have available.

    #our cluster re+uires three t5pes of nodes:

    ; Head node. head node mediates all access to the cluster resources and acts as a

    sin$le point for cluster deplo5ment, mana$ement, and ?ob schedulin$. -here is onl5 one

    head node per cluster.

    ; Ser"ice node. service node provides standard networ3 services, such as director5 and

    1 and 1'&P services, and also maintains and deplo5s compute node ima$es to new

    hardware in the cluster. )nl5 one service node is needed for the cluster, althou$h 5ou can

    have more than one service node for different roles in the clusterfor e9ample, movin$

    the ima$e deplo5ment service to a separate node.

    ; Compute node. compute node provides computational resources for the cluster.

    &ompute nodes are provided ?obs and are mana$ed b5 the head node.

    dditional node t5pes that can be used but are not re+uired are remote administration nodesand application development nodes. For an overview of device roles in the cluster, see theWindows &ompute &luster erver 200 *eviewers Euide@http://www.microsoft.com/windowsserver200/ccs/reviewers$uide.msp9A.

    #our cluster also depends on the number and t5pes of networ3s used to connect the nodes.-he *eviewerCs Euide discusses the topolo$ies that 5ou can use to connect 5our nodes, b5usin$ combinations of private and public adapters for messa$e passin$ between the nodesand s5stem traffic amon$ all of the nodes. For the cluster detailed in this $uide, the head nodeand service node have public and private adapters for s5stem traffic, and the compute nodeshave private and messa$e passin$ interface @6P!A adapters. @$ote&-his is not a supportedtopolo$5 but is ver5 similar to one that is.A &onsult the *eviewerCs Euide for the advanta$es ofeach networ3 topolo$5.

    astl5, 5ou should consider the level of cluster e9pertise, networ3in$ 3nowled$e, and amountof mana$ement time available on 5our staff to dedicate to 5our cluster. lthou$h deplo5mentand mana$ement is simplified with Windows &ompute &luster erver 200, 3eep in mind thatno matter what the circumstances, a lar$e;scale compute cluster deplo5ment should not beta3en li$htl5. !t is important to understand how mana$ement and deplo5ment wor3 when

    http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    7/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 7

    plannin$ for the appropriate resources. &ompute &luster erver uses robust, enterprise;$radetechnolo$ies for all aspects of networ3 and device mana$ement. !ts mana$ement tools andpro$rams allow $ranular, role;based mana$ement of securit5 for cluster administration andcluster users, and its networ3 and s5stem mana$ement tools can easil5 and +uic3l5 deplo5applications and ?obs usin$ familiar, wi>ard;based interfaces. dditional compute nodes canbe added automaticall5 to the compute cluster b5 simpl5 plu$$in$ the nodes in andconnectin$ them to the cluster. =9tensive @and e9pensiveA dail5 hands;on twea3in$,confi$uration, and mana$ement are not needed when usin$ commodit5 hardware and astandards;based infrastructure.

    Install Your Cluster Hardware

    For ease of mana$ement and confi$uration, all nodes in the deplo5ment in this $uide will usethe same basic hardware platform. 'ardware re+uirements for computers runnin$ Windows&ompute &luster erver 200 are similar to those for Windows erver 200, tandard 9%=dition. #ou can find the s5stem re+uirements for 5our cluster athttp://www.microsoft.com/windowsserver200/ccs/s5sre+s.msp9.-able shows a list ofhardware for all nodes. -his list is based on the hardware used in the & deplo5ment.

    Table 1: Hardware for All Nodes

    &omponent *ecommended 'ardware

    &P8 "lade servers ; =ach blade has two sin$le;core

    .2 E'> processors with 2 6" cache and an

    (006'> front;side bus. 6otherboard includes 9

    P&! =9press slots.

    *6 9 E" 00 6'> 1!66s. For compute nodes,

    5ou should plan on havin$ 2 E" *6 per core.

    tora$e &! adapter, 7E" 0D *P6 8ltra20 &!

    dis3. *!1 ma5 be used on an5 node, but was

    not used in this deplo5ment. For the head node,

    5ou should plan on havin$ three dis3s: one forthe ), one for the database, and one for the

    transaction lo$s. -his will provide improved

    performance and throu$hput.

    etwor3 !nterface &ards 000 6b Ei$abit =thernet adapter

    9 !nfini"and 9 P&= =9press adapter

    Ei$abit etwor3 'ardware (;port Ei$abit switch per rac3: 0 ports for

    blades, for uplin3 to rin$

    (;port a5er 2 Ei$abit switches in rin$

    confi$uration

    !nfini"and etwor3 'ardware 49 2;port !nfini"and switches per rac3

    29 %;port !nfini"and switches for cross;rac3connectivit5

    $ote

    -he head node and the networ3 services node each use two Ei$abit =thernet networ3adaptersG both the compute nodes and the head nodes use the private 6P! networ3, thou$hthe head nodeCs 6P! interface was disabled for this specific deplo5ment. lso, the servicenode re+uires a 2;bit operatin$ s5stem, since 1 will onl5 wor3 with 2;bit, but 5ou can runthe operatin$ s5stem on 2;bit or %;bit hardware. @-his is a custom confi$uration used on the

    http://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspxhttp://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspxhttp://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspx
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    8/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster (

    cluster deplo5ment at & and is not supported for $eneral use. 'owever, it is ver5 similarto a supported cluster topolo$5. For more information on supported cluster topolo$ies, pleaserefer to the Windows &ompute &luster erver 200 *eviewers Euide.A

    Configure Your Cluster Hardware

    When 5ou have added 5our switches and blades to the rac3, 5ou must confi$ure the networ3connections and networ3 hardware prior to installin$ the networ3 software. -o confi$ure 5ourhardware, follow the chec3list in -able 2.

    Table 2: Hardware Configuration Checklist

    &hec3 whencompleted

    &onfi$uration !tem

    &onnect all hi$h;speed interconnect connections from the pass;throu$h

    module on the chassis to the rac3Cs hi$h;speed interconnect switches.

    &onnect all Ei$abit =thernet connections from the pass;throu$h module

    on the chassis to the rac3Cs (;port Ei$abit =thernet switch.

    &onnect all !nfiniband switches to the a5er 2 switches.&onnect all Ei$abit =thernet switches to the Ei$abit =thernet a5er 2

    switches.

    1isable the built;in subnet mana$er on all switches. -he built;in subnet

    mana$er doesnCt support )pen!" clients, and conflicts with the subnet

    mana$er that does support such clients.

    &han$e the "!) boot se+uence on all nodes to etwor3 Pre;boot

    =9ecution =nvironment @PH=A first, &1 *)6 second, and 'ard 1rive

    third. For platforms that d5namicall5 remove missin$ devices at power;

    up, an efficient wa5 to set the hard drives last in the boot order is to pull

    the hard drives, power up the devices once, power off the devices, put

    the drives bac3 in, and then power up a$ain. -he boot order will be set

    correctl5 thereafter.

    1isable h5perthreadin$ on all nodes and set the nodeCs s5stem cloc3 to

    the correct time >one, if re+uired.

    )btain a list of all private Ei$abit =thernet adapter 6& addresses for

    the compute nodes. -hese addresses are used as input with a

    confi$uration script to identif5 5our nodes and confi$ure them with the

    proper ima$e. !n some cases 5ou can use the blade chassis telnet

    interface to collect the 6& addresses. ee ppendi9 & for a

    description of the input file and the file format.

    btain !e"uired #oftware!n addition to Windows &ompute &luster erver 200, 5ou will need to obtain operatin$s5stems, administration utilities, drivers, and Iuic3 Fi9 files to brin$ 5our s5stems up;to;date.-able lists the software re+uired for each node t5pe, and the notes followin$ the chart show5ou where to obtain the necessar5 software. -he followin$ list is based on the software usedin the & deplo5ment.

    http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspxhttp://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    9/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    Table $: #oftware !e"uired b% Node T%&e

    oftware *e+uired b5 ode -5pe 'eadode

    erviceode

    &omputeode

    Windows erver 200 *2 tandard =dition 9%

    Windows erver 200 *2 =nterprise =dition 9(%

    Windows erver 200 &ompute &luster =dition 9%

    6icrosoft &ompute &luster Pac3

    I erver 2004 tandard =dition 9%

    utomated 1eplo5ment ervices @1A version .

    6icrosoft 6ana$ement &onsole @66&A .0

    .=- Framewor3 2.0

    Windows Preinstall =nvironment @WinP=A

    IF= D"0(

    IF= D"7(

    6icrosoft 5stem Preparation tool @s5sprep.e9eA

    &luster confi$uration and deplo5ment scripts

    atest networ3 adapter drivers

    otes on the software re+uired the deplo5ment described in this paper:

    'icrosoft S() Ser"er* + Standard .dition /01& "5 default, the &ompute &luster Pac3will install 61= on the head node for data and node trac3in$ purposes. "ecause 61= islimited to ei$ht concurrent connections, I erver tandard =dition 2004 is recommendedfor clusters with more than % compute nodes.

    %2S "ersion #3#&1 re+uires 2;bit versions of Windows erver 200 =nterprise =ditionfor ima$e mana$ement and deplo5ment. Future 6icrosoft ima$in$ technolo$5 @Windows1eplo5ment ervices, available in the ne9t release of Windows erver, code nameon$hornKA will support %;bit software. #ou can download the latest version of 1 from the6icrosoft Web site@http://www.microsoft.com/windowsserver200/technolo$ies/mana$ement/ads/default.msp9A.

    "ecause this paper is based on a previous lar$e;scale compute cluster deplo5ment at &,it details usin$ 1 to deplo5 compute node ima$es as opposed to usin$ 6icrosoft Windows1eplo5ment ervices @W1A. 'owever, future updates to this paper will e9plain how to useW1 to deplo5 compute node ima$es to 5our cluster.

    ''C 43& 66& .0 is re+uired for the administration node, which ma5 or ma5 not be thehead node. !t is automaticall5 installed b5 the &ompute &luster Pac3 on the computer that isused to administer the cluster. #ou can also download and install the latest versions forWindows erver 200 and Windows HP 9(% and 9% versions at the 6icrosoft Web site@http://support.microsoft.com/L3bidM072%4A.

    3$.T !ramework +3& -he .=- Framewor3 is automaticall5 installed b5 the &ompute &lusterPac3. #ou can also download the latest version at the 6icrosoft Web site@http://msdn2.microsoft.com/en;us/netframewor3/aa742.asp9A.

    http://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspxhttp://support.microsoft.com/?kbid=907265http://msdn2.microsoft.com/en-us/netframework/aa731542.aspxhttp://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspxhttp://support.microsoft.com/?kbid=907265http://msdn2.microsoft.com/en-us/netframework/aa731542.aspx
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    10/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 0

    5inP.& #ou will need a cop5 of Windows Preinstallation =nvironment for Windows erver200 P. !f 5ou need to add 5our Ei$abit =thernet drivers to the WinP= ima$e, 5ou will needto obtain a cop5 of the Windows erver 200 P )=6 Preinstallation Dit @)PDA, whichcontains the pro$rams needed to update the WinP= ima$e for 5our hardware. WinP= and the)PD are available onl5 to customers with enterprise or volume license a$reementsG contact5our 6icrosoft representative for more information.

    (!. 67#18#& -his Iuic3 Fi9 is for potential problems when deplo5in$ Winsoc3 1irect in afast tora$e rea etwor3 @A environment. #ou can download the +uic3 fi9 at the6icrosoft Web site@http://support.microsoft.com/L3bidM0(A.

    (!. 67#1981& -his Iuic3 Fi9 is in response to a ecurit5 dvisor5 and provides additional3ernel protection in some environments. #ou can download the +uic3 fi9 at the 6icrosoft Website@http://support.microsoft.com/L3bidM7(A.

    Sysprep3e/e&5sprep.e9e is used to help prepare the compute node ima$e prior todeplo5ment. 5sprep is included as part of Windows erver 200 &ompute &luster =dition.$oteou must use the 9% bit version of 5sprep in order to capture and deplo5 5ourima$es.

    Cluster configuration and deployment scripts& -hese scripts are available to download athttp://www.microsoft.com/technet/scriptcenter/scripts/ccs/deplo5/default.msp9. -he5 include

    hard;coded paths and re+uire 5ou to follow the installation and usa$e instructions e9actl5 asdescribed in this $uide. !f 5ou must modif5 the scripts for 5our deplo5ment, ma3e sure that5ou verif5 that the scripts wor3 in 5our environment before usin$ them to deplo5 5our cluster.

    For the scripts to run properl5, 5ou will also need specific information about 5our cluster andits hardware. ppendi9 & contains a sample input file @dd&omputeodes.csvA that is used toautomaticall5 confi$ure the compute cluster nodes and populate ctive 1irector5 with nodeinformation. -able lists the specific items needed, with room for 5ou to write down thevalues for 5our deplo5ment. #ou can then use this information when buildin$ 5our cluster andwhen creatin$ 5our compute node ima$es. Follow the instructions in ppendi9 & for creatin$5our own sample input file.

    $ote

    =ver5 item in -able must have an entr5 or the input file will not wor3 properl5. !f 5ou do nothave a value for a field, use a h5phen N;N for the field instead.

    )atest network adapter dri"ers&&ontact the manufacturer of 5our networ3 adapters for themost recent drivers. #ou will need to install these drivers on 5our cluster nodes.

    Table ': Cluster Infor(ation Needed for #cri&t In&ut )ile

    !nput Balue #our Balue 1escription

    Fullame Populates the cluster node re$istr5 with

    the *e$istered )wner name.

    )r$anisation name Populates the cluster node re$istr5 with

    the *e$istered )r$ani>ation name.

    ProductDe5 24;di$it alphanumeric product 3e5 usedfor all compute cluster nodes. &ontact

    5our 6icrosoft representative for 5our

    volume license 3e5.

    erver ame Populates ctive 1irector5 with a

    &ompute &luster node name.

    http://support.microsoft.com/?kbid=910481http://support.microsoft.com/?kbid=914784http://support.microsoft.com/?kbid=914784http://support.microsoft.com/?kbid=914784http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspxhttp://support.microsoft.com/?kbid=910481http://support.microsoft.com/?kbid=914784http://support.microsoft.com/?kbid=914784http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspx
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    11/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    !nput Balue #our Balue 1escription

    rv 1escription Populates the 1 6ana$ement console

    with a te9t description of the node. &an

    be used to list rac3 placement or other

    helpful information.

    erver 6& Ei$abit =thernet 6& address for eachcompute cluster node.

    6achine ame 8sed to confi$ure the cluster node with a

    machine name. 6ust match the value in

    the Ser"er $amefield.

    dmin Password ocal administrator password.

    1omain -he cluster domain name @for e9ample,

    'P&&luster.localKA.

    1omain 8sername ccount name with permission to add

    computers to a domain.

    1omain Password Password for the account with

    permission to add computers to a

    domain.

    !ma$eame -he ima$e name to be installed on the

    cluster node @for e9ample, &&!ma$eA.

    'P& &luster ame -he head node name must be used for

    the cluster name.

    etwor3-opolo$5 6ust be in$leK.

    Partitioni>e ot used.

    Public!P ot used.

    Publicubnet ot used.

    PublicEatewa5 ot used.

    Public1 ot used.

    Public!&ame ot used.

    Public6& ot used.

    Private!P ot used.

    Privateubnet ot used.

    PrivateEatewa5 ot used.

    Private1 ot used.

    Private!&ame ot used.

    Private6& ot used.

    6P!!P ssi$ns a static address to the 6P!

    adapter @for e9ample, .0.0.A.

    6P!ubnet ssi$ns a subnet mas3 to the 6P!

    adapter @for e9ample, 244.244.0.0A.

    6P!Eatewa5 ot used.

    6P!1 ot used.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    12/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2

    !nput Balue #our Balue 1escription

    6P!!&ame ot used.

    6P!6& ot used.

    6achine)8 Populates ctive 1irector5 with 6achine

    )8 information @for e9ample,

    )8M&luster

    ervers,1&M'P&&luster,1&MlocalA.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    13/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    Installation* Configuration* and Tuning #te&s

    -o install, confi$ure, and tune a hi$h;performance compute cluster, complete the followin$steps:

    . !nstall and confi$ure the service node.2. !nstall and confi$ure 1 on the service node.

    . !nstall and confi$ure the head node.

    . !nstall the &ompute &luster Pac3.

    4. 1efine the cluster topolo$5.

    %. &reate the compute node ima$e.

    7. &apture and deplo5 ima$e to compute nodes.

    (. &onfi$ure and mana$e the cluster.

    . 1eplo5 the client utilities to cluster users.

    #te& 1: Install and Configure the #er+ice Node

    -he service node provides all the bac3;end networ3 services for the cluster, includin$

    authentication, name services, and ima$e deplo5ment. !t uses standard Windows technolo$5

    and services to mana$e 5our networ3 infrastructure. -he service node has two Ei$abit

    =thernet networ3 adapters and no 6P! adapters. )ne adapter connects to the public networ3G

    the other connects to the private networ3 dedicated to the cluster.

    -here are five tas3s that are re+uired for installation and confi$uration:

    . !nstall and confi$ure the base operatin$ s5stem.

    2. !nstall ctive 1irector5, 1omain ame ervices @1A, and 1'&P.. &onfi$ure 1.

    . &onfi$ure 1'&P.

    4. =nable *emote 1es3top for the cluster.

    Install and configure the base operating system.Follow the normal setup procedure forWindows erver 200 *2 =nterprise =dition, with the e9ceptions as noted in the followin$procedure.

    To install and configure the base operating system

    "oot the computer to the Windows erver 200 *2 =nterprise =dition &1.

    . ccept the license a$reement.

    2. )n the Partition )istscreen, create two partitions: one partition of 0 E", and a

    second usin$ the remainder of the space on the hard drive. elect the 0 E" partition

    as the install partition, and then press =-=*.

    . )n the !ormat Partitionscreen, accept the default of -F, and then press =-=*.

    Proceed with the remainder of the te9t;mode setup. -he computer then reboots into

    $raphical setup mode.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    14/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    . )n the )icensing 'odespa$e, select the option for which 5ou are licensed, and then

    confi$ure the number of concurrent connections if needed. &lic3 $e/t.

    4. )n the Computer $ame and %dministrator Passwordpa$e, t5pe a name for the

    service node @for e9ample, =*B!&=)1=A. -5pe 5our local administrator password

    twice, and then press =-=*.

    %. )n the $etworking Settingspa$e, select Custom settings,and then clic3 $e/t.

    7. )n the $etworking Componentspa$e for 5our private adapter, select Internet

    Protocol :TCP;IPard to

    confi$ure 5our server as a t5pical first server in a domain. -he wi>ard confi$ures 5our

    server as a root domain controller, installs and confi$ures 1, and then installs and

    confi$ures 1'&P.

    To install %cti"e 2irectory, 2$S, and 2HCP

    . o$ in to 5our service node as dministrator. !f the 'anage @our Ser"erpa$e is not

    visible, clic3 Start,and then clic3 'anage @our Ser"er.

    2. &lic3 %dd or remo"e a role. -he Configure @our Ser"er 5iAardstarts. &lic3 $e/t.

    . )n the &onfi$uration )ptions pa$e, select Typical configuration for a first ser"er,

    and then clic3 $e/t.

    . )n the %cti"e 2irectory 2omain $amepa$e, t5pe the domain name that will be

    used for 5our cluster and append the .localK suffi9 @for e9ample, 'P&&luster.localA.

    &lic3 $e/t.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    15/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 4

    4. )n the $etI>S 2omain $amepa$e, accept the default et"!) name @for

    e9ample, 'P&&8-=*A and clic3 $e/t. t the Summary of Selections page, clic3

    $e/t. !f the Configure @our Ser"er 5iAardprompts 5ou to close an5 open pro$rams,

    clic3 >6.

    %. )n the $%T Internet Connectionpa$e, ma3e sure the public adapter is selected.

    1eselect .nable security on the selected interface,and then clic3 $e/t. !f 5ouhave more than two networ3 adapters in 5our computer, the $etwork Selectionpa$e

    appears. elect the private adapter and then clic3 $e/t. &lic3 !inish. fter the

    files are copied, the server reboots.

    7. fter the server reboots, lo$ on as dministrator. *eview the actions listed in the

    Configure @our Ser"er 5iAard, and then clic3 $e/t. &lic3 !inish.

    Configure 2$S3 1 is re+uired for the cluster and will be used b5 people who want to

    use the cluster. !t is lin3ed to ctive 1irector5 and mana$es the node names that are in

    use. 1 must be confi$ured so that name resolution will function properl5 on 5our

    cluster. -he followin$ tas3 helps to confi$ure 5our 1 settin$s for 5our private and public

    networ3s.

    To configure 2$S

    . &lic3 Start,and then clic3 'anage @our Ser"er. !n the 1 erver section, clic3

    'anage this 2$S ser"er. #ou can also start the 1 6ana$ement console b5

    clic3in$ Start, %dministrati"e Tools, and then 2$S.

    2. *i$ht;clic3 5our server, and then clic3 Properties.

    . &lic3 the Interfacestab. elect >nly the following IP addresses. elect the public

    interface, and then clic3 ?emo"e. )nl5 the private interface should be listed. !f it is

    not, t5pe the !P address of the private interface, and then clic3 %dd. -his ensures that

    5our services node will provide 1 services onl5 to the private networ3 and not to

    addresses on the rest of 5our networ3. &lic3 %pply.

    . &lic3 the !orwarderstab. !f the public interface is usin$ 1'&P, confirm that the

    forwarder !P list has the !P address for a 1 server in 5our domain. !f not, or if 5ou

    are usin$ a static !P address, t5pe the !P address for a 1 server on 5our public

    networ3, and then clic3 %dd. -his ensures that if the service node cannot resolve

    name +ueries, the re+uest will be forwarded to another name server on 5our networ3.

    &lic3 >6.

    4. !n the 1 6ana$ement console, select ?e"erse )ookup Bones.*i$ht;clic3

    ?e"erse )ookup Bones, and then clic3 $ew Bone. -he $ew Bone 5iAardstarts.

    &lic3 $e/t.%. )n the Bone Typepa$e, select Primary Aone, and then select Store the Aone in

    %cti"e 2irectory. &lic3 $e/t.

    7. )n the %cti"e 2irectory Bone ?eplication Scopepa$e, select To all domain

    controllers in the %cti"e 2irectory domain. &lic3 $e/t.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    16/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster %

    (. )n the ?e"erse )ookup Bone $ame pa$e, select $etwork I2,and then t5pe the first

    three octets of 5our private networ3Cs !P address @for e9ample, 0.0.0A. reverse

    name loo3up is automaticall5 created for 5ou. &lic3 $e/t.

    . )n the 2ynamic =pdatepa$e, select %llow only secure dynamic updates. &lic3

    $e/t.

    0. )n the Completing the $ew Bone 5iAardpa$e, clic3 !inish. -he new reverse

    loo3up >one is added to the 1 6ana$ement console. &lose the 1 6ana$ement

    console.

    Configure 2HCP3 #our cluster re+uires automated !P addressin$ services to 3eep node

    traffic to a minimum. ctive 1irector5 and 1'&P wor3 to$ether so that networ3 addressin$

    and resource allocation will function smoothl5 on 5our cluster. 1'&P has alread5 been

    confi$ured for 5our cluster networ3. 'owever, if 5ou want finer control over the number of

    !P addresses available and the information provided to 1'&P clients, 5ou must delete the

    current 1'&P scope and create a new one, usin$ settin$s that reflect 5our cluster

    deplo5ment.

    To configure 2HCP

    . &lic3 Start,and then clic3 'anage @our Ser"er. !n the 1'&P erver section, clic3

    'anage this 2HCP ser"er. #ou can also start the 1'&P 6ana$ement console b5

    clic3in$ Start, clic3in$ %dministrati"e Tools, and then clic3in$ 2HCP.

    2. *i$ht;clic3 the scope name @for e9ample, Scope #333D Scope#A, and then clic3

    2eacti"ate. When prompted, clic3@es. *i$ht;clic3 the scope a$ain, and then clic3

    2elete. When prompted, clic3@es. -he old scope is deleted.

    . *i$ht;clic3 5our server name and then clic3 $ew Scope. -he $ew Scope 5iAard

    starts. &lic3 $e/t.

    . )n the Scope $amepa$e, t5pe a name for 5our scope @for e9ample, 'P& &lusterKA

    and a description for 5our scope. &lic3 $e/t.

    4. )n the IP %ddress ?angepa$e, t5pe the start and end ran$es for 5our cluster. For

    e9ample, the start address would be the same address used for the private adapter:

    0.0.0.. -he end address depends on how man5 nodes 5ou plan to have in 5our

    cluster. For up to 240 nodes, the end address would be 0.0.0.24. For 240 to 400

    nodes, the end address would be 0.0..24. For the subnet mas3, 5ou can either

    increase the len$th to %, or t5pe in a subnet mas3 of 244.244.0.0. &lic3 $e/t.

    %. )n the %dd ./clusionspa$e, 5ou define a ran$e of addresses that will not be

    handed to computers at boot time. -he e9clusion ran$e should be lar$e enou$h to

    include all devices that use static !P addresses. For this e9ample, t5pe the startaddress of 0.0.0. and an end address of 0.0.0.. &lic3 %dd, and then clic3 $e/t.

    7. )n the )ease 2urationpa$e, accept the defaults, and then clic3 $e/t.

    (. )n the Configure 2HCP >ptionspa$e, select@es, I want to configure these

    options now,and then clic3 $e/t.

    . )n the ?outer :2efault Gateway

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    17/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 7

    0. )n the 2omain $ame and 2$S Ser"erspa$e, in the Parent domainte9t bo9, t5pe

    5our domain name @for e9ample, 'P&&luster.localA. !n the Ser"er namete9t bo9,

    t5pe the server name @for e9ample, =*B!&=)1=A. !n the !P 11*= fields, t5pe

    the private networ3 adapter address @for e9ample, 0.0.0.A. &lic3 %dd, and then clic3

    $e/t.

    . )n the 5I$S Ser"erspa$e, clic3 $e/t.

    2. )n the %cti"ate Scopepa$e, select@es, I want to acti"ate this scope now,and

    then clic3 $e/t.

    . )n the Completing the $ew Scope 5iAardpa$e, clic3 !inish. &lose the 1'&P

    6ana$ement console.

    .nable ?emote 2esktop for the cluster3 #ou can enable *emote 1es3top for nodes on

    5our cluster so that 5ou can lo$ on remotel5 and mana$e services b5 usin$ the nodeCs

    des3top.

    To disable 5indows !irewall and enable ?emote 'anagement for the domain. &lic3 Start, clic3 %dministrati"e Tools, and then clic3 %cti"e 2irectory =sers and

    Computers.

    2. *i$ht;clic3 5our domain @for e9ample, hpccluster.localA, clic3 $ew, and then clic3

    >rganiAational =nit.

    . -5pe the name of 5our new )8 @for e9ample, &luster erversA and then clic3 >6. new

    )8 is created in 5our domain.

    . *i$ht;clic3 5our )8 and then clic3 Properties. -he )8Properties dialo$ appears. &lic3

    the Group Policytab. &lic3 $ew. -5pe the name for 5our new Eroup Polic5 @for e9ample,

    =nable *emote 1es3topA and then press =-=*.

    4. &lic3 .dit. -he Eroup Polic5 )b?ect =ditor opens. "rowse to &omputer &onfi$uration Odministrative -emplates O Windows &omponents O -erminal ervices.

    %. 1ouble;clic3 %llow users to connect remotely using Terminal Ser"ices . &lic3 .nabled

    and then clic3 >6. &lose the Eroup Polic5 )b?ect =ditor.

    7. )n the )8 Properties pa$e, on the Eroup Polic5 tab, select 5our new Eroup Polic5 and

    then clic3 >ptions. &lic3 $o >"erride, clic3 >6. #ou have created a new Eroup Polic5

    for 5our )8 that enables *emote 1es3top. &lic3 >6.

    #te& 2: Install and Configure A,# on the #er+ice Node

    1 is used to install compute node ima$es on new hardware with little or no input from

    the cluster administrator. -his automated procedure ma3es it eas5 to set up and installnew nodes on the cluster, or to replace failed nodes with new ones. -o install and

    confi$ure 1, perform the followin$ procedures:

    . &op5 and update the WinP= binaries.

    2. &op5 and edit the script files.

    . !nstall and confi$ure 1.

    . hare the 1 certificate.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    18/39

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    19/39

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    20/39

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    21/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2

    . )n the 2e"ice 2estination page, select $one, and then clic3 $e/t. &lic3 !inish. #our

    capture template is added to 1.

    0. *epeat steps throu$h . !n step %, use 1eplo5 &ompute ode and *un from WinP= as

    the name and description. !n step (, select the file 1eplo5;&&;ima$e;with;winpe.9ml.

    When finished, 5ou have added the deplo5ment template to 1.

    %dd de"ices to %2S3 Follow the normal setup procedure for Windows erver 200 *2=nterprise =dition, with the e9ceptions noted later.

    To add de"ices to %2S

    . Populate the 1 server with 1 devices. &lic3 Start, clic3 ?un, t5pe cmd.e9e, and

    then clic3 >6. &han$e the director5 to &:O'P&;&&Ocripts.

    2. -5pe dd11evices.vbs dd&omputeodes;ample.csv @use the name of 5our input

    file instead of the sample file nameA. -he script will echo the nodes as the5 are added to

    the 1 server. When the script is finished, close the command window.

    !f 5our compan5 uses a pro95 server to connect to the !nternet, 5ou should confi$ure 5ourserver so that it can receive s5stem and application updates from 6icrosoft.

    . -o confi$ure 5our pro95 server settin$s, open !nternet =9plorerptions.

    2. &lic3 the Connectionstab, and then clic3 )%$ Settings.

    . )n the )ocal %rea $etwork :)%$< Settingspa$e, select =se a pro/y ser"er foryour )%$. =nter the 8* or !P address for 5our pro95 server.

    . !f 5ou need to confi$ure secure '--P settin$s, clic3 %d"anced, and then enter the8* and port information as needed.

    4. &lic3 >6three times, and then close !nternet =9plorer.

    When 5ou have finished confi$urin$ 5our server, clic3 Start, clic3 %ll Programs, and thenclic3 5indows =pdate. -his will ensure that 5our server is up;to;date with service pac3s andsoftware updates that ma5 be needed to improve performance and securit5.

    #te& $: Install and Configure the Head Node

    -he head node is responsible for mana$in$ the compute cluster nodes, performin$ ?obcontrol, and actin$ as the $atewa5 for submitted and completed ?obs. !t re+uires I erver2004 tandard =dition as part of the underl5in$ service and support structure. #ou shouldconsider usin$ three hard drives for 5our head node: one for the operatin$ s5stem, one for theI erver database, and one for the I erver transaction lo$s. -his will provide reduceddrive contention, better overall throu$hput, and some transactional redundanc5 should thedatabase drive fail.

    !n some cases, enablin$ h5perthreadin$ on the head node will also result in improvedperformance for heavil5;loaded I erver applications.

    -here are two tas3s that are re+uired for installin$ and confi$urin$ 5our head node:

    . !nstall and confi$ure the base operatin$ s5stem.

    2. !nstall and confi$ure I erver 2004 tandard =dition.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    22/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 22

    To install and configure the base operating system

    . )n the head node computer, boot to the Windows erver 200 *2 tandard =dition

    9% &1.

    2. ccept the license a$reement.

    . )n the Partition )istscreen, create two partitions: one partition of 0 E", and a

    second that uses the remainder of the space on the hard drive. elect the 0 E"

    partition as the install partition, and then press =-=*.

    . )n the !ormat Partitionscreen, accept the default of -F, and then press =-=*.

    Proceed with the remainder of the te9t;mode setup. -he computer then reboots into

    $raphical setup mode.

    4. )n the )icensing 'odespa$e, select the option for which 5ou are licensed, and then

    confi$ure the number of concurrent connections, if needed. &lic3 $e/t.

    %. )n the Computer $ame and %dministrator Passwordpa$e, t5pe a name for the

    head node @for e9ample, '=1)1=A. -5pe the account with permission to ?oin a

    computer to the domain @for e9ample, hpcclusterOadministratorA, t5pe the password

    twice, and then press =-=*.

    7. )n the $etworking Settingspa$e, select Typicalsettings,and then clic3 $e/t. -his

    will automaticall5 assi$n addresses to 5our public and private adapters. !f 5ou want to

    use static !P addresses for either interface, select Custom Settings, and then clic3

    $e/t. Follow the steps that 5ou used to confi$ure 5our service node adapter settin$s.

    (. )n the 5orkgroup or Computer 2omainpa$e, select@es, make this computer a

    member of a domain. -5pe the name of 5our cluster domain @for e9ample,

    'P&&luster.localA, and then clic3 $e/t. When prompted, t5pe the name and the

    password for an account that has permission to add computers to the domain

    @t5picall5, the dministrator accountA, and then clic3 >6. ote: !f 5our networ3 adapter

    drivers are not included on the Windows erver 200 &1, then 5ou will not be able to?oin a domain at this time. !nstead, ma3e the computer a member of a wor3$roup,

    complete the rest of setup, install 5our networ3 adapters, and then ?oin 5our head

    node to the domain.

    When 5ou have confi$ured the base operatin$ s5stem, 5ou can install I erver 2004tandard =dition on 5our head node.

    To install and configure S() Ser"er + Standard .dition

    . o$ on to 5our server as dministrator. !nsert the I erver 2004 tandard =dition 9%

    &1 into the head node. !f setup does not start automaticall5, browse to the &1 drive and

    then run setup.e9e.2. )n the .nd =ser )icense %greement pa$e, select I accept the licensing terms and

    conditions, and then clic3 $e/t.

    . )n the Installing Prereuisitespa$e, clic3 Install.When the installations are complete,

    clic3 $e/t.-he 5elcome to the 'icrosoft S() Ser"er Installation 5iAardstarts.&lic3

    $e/t.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    23/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2

    . )n the System Configuration Check pa$e, the installation pro$ram displa5s a report

    with potential installation problems. #ou do not need to install !! or address an5 !!;

    related warnin$s because !! is not used in this deplo5ment. &lic3 $e/t.

    4. )n the ?egistration Information pa$e, complete the $ame and Companyfields with the

    appropriate information, and then clic3 $e/t.

    %. )n the Components to Installpa$e, select all chec3 bo9es, and then clic3 $e/t.

    7. )n the Instance $ame pa$e, select $amed instance, and then t5pe

    C>'P=T.C)=ST.?in the te9t bo9. #our cluster must have this name, or Windows

    &ompute &luster will not wor3. &lic3 $e/t.

    (. )n the Ser"ice %ccountpa$e, select =se the built-in System account, and then select

    )ocal systemin the drop;down list. !n the Start ser"ices at the end of setupsection,

    select all options e9cept S() Ser"er %gent, and then clic3 $e/t.

    . )n the %uthentication 'odepa$e, select 5indows %uthentication 'ode. &lic3 $e/t.

    0. )n the Collation Settings pa$e, select S() collations, and then select 2ictionary

    order case-insensiti"e for use with #++ Character Setfrom the drop;down list. &lic3

    $e/t.

    . )n the .rror and =sage ?eport Settingspa$e, clic3 $e/t.

    2. )n the ?eady to Install pa$e, clic3 Install.When the Setup Progresspa$e appears,

    clic3 $e/t.

    . )n the Completing 'icrosoft S() Ser"er + Setup page, clic3 !inish.

    . )pen the 1is3 6ana$ement console. &lic3 Start, clic3 ?un, t5pe dis3m$mt.msc, and then

    clic3 >6.

    4. *i$ht;clic3 the second partition on 5our drive, and then clic3 !ormat. !n the !ormatdialo$

    bo9, select (uick !ormat, and then clic3 >6. When the format process finishes, close

    the 1is3 6ana$ement console.

    !f 5our compan5 uses a pro95 server to connect to the !nternet, 5ou should confi$ure 5ourhead node so that it can receive s5stem and application updates from 6icrosoft.

    . -o confi$ure 5our pro95 server settin$s, open !nternet =9plorer. &lic3 Tools, and thenclic3 Internet >ptions.

    2. &lic3 the Connectionstab, and then clic3 )%$ Settings.

    . )n the )ocal %rea $etwork :)%$< Settingspa$e, select =se a pro/y ser"er foryour )%$. =nter the 8* or !P address for 5our pro95 server.

    . !f 5ou need to confi$ure secure '--P settin$s, clic3 %d"anced, and then enter the8* and port information as needed.

    4. &lic3 >6three times, and then close !nternet =9plorer.When 5ou have finished confi$urin$ 5our server, clic3 Start, clic3 %ll Programs, and thenclic3 5indows =pdate. -his will ensure that 5our server is up;to;date with service pac3s andsoftware updates that ma5 be needed to improve performance and securit5. #ou should electto install 6icrosoft 8pdate from the Windows 8pdate pa$e. -his service provides servicepac3s and updates for all 6icrosoft applications, includin$ I erver. Follow the instructionson the Windows 8pdate pa$e to install the 6icrosoft 8pdate service.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    24/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2

    #te& ': Install the Co(&ute Cluster Pack

    When the head node has been confi$ured, 5ou can install the &ompute &luster Pac3 thatcontains services, interfaces, and supportin$ software that is needed to create and confi$urecluster nodes. !t also includes utilities and mana$ement infrastructure for 5our cluster.

    To install the Compute Cluster Pack. !nsert the &ompute &luster Pac3 &1 into the head node. -he 'icrosoft Compute

    Cluster Pack Installation5iAardappears. &lic3 $e/t.

    2. )n the 'icrosoft Software )icense Termspa$e, select I accept the terms in the

    license agreement,and then clic3 $e/t.

    . )n the Select Installation Typepa$e, select Create a new compute cluster with this

    ser"er as the head node. 1o not use the head node as a compute node. &lic3 $e/t.

    . )n the Select Installation )ocationpa$e, accept the default. &lic3 $e/t.

    4. )n the Install ?euired Componentspa$e, a list of re+uired components for the

    installation appears. =ach component that has been installed will appear with a chec3

    ne9t to it. elect a component without a chec3, and then clic3 Install.

    %. *epeat the previous step for all uninstalled components. When all of the re+uired

    components have been installed, clic3 $e/t. -he 'icrosoft Compute Cluster Pack

    Installation 5iAardcompletes. &lic3 !inish.

    #te& -: ,efine the Cluster To&olog%

    fter the &ompute &luster Pac3 installation for the head node is complete, a &luster1eplo5ment -as3s window appears with a -o 1o ist. !n this procedure, 5ou will confi$ure thecluster to use a networ3 topolo$5 that consists of a sin$le private networ3 for the computenodes and a public interface from the head node to the rest of the networ3.

    To define the cluster topology

    . )n the To 2o )istpa$e, in the $etworkingsection, clic3 Configure Cluster $etwork

    Topology. -he Configure Cluster $etwork Topology 5iAardstarts. &lic3 $e/t.

    2. )n the Select Setup Type pa$e, select Compute nodes isolated on pri"ate network

    from the drop;down list. $raphic appears that shows 5ou a representation of 5our

    networ3. #ou can learn more about the different networ3 topolo$ies b5 clic3in$ the )earn

    more about this setuplin3. When 5ou have reviewed the information, clic3 $e/t.

    . )n the Configure Public $etworkpa$e, select the correct public @e9ternalA networ3

    adaptor from the drop;down list. -his networ3 will be used for communicatin$ between the

    cluster and the rest of 5our networ3. &lic3 $e/t.

    . )n the Configure Pri"ate $etworkpa$e, select the correct private @internalA adaptorfrom the drop;down list. -his networ3 will be used for cluster mana$ement and node

    deplo5ment. &lic3 $e/t.

    4. )n the .nable $%T =sing ICSpa$e, select 2isable Internet Connection Sharing for

    this cluster. &lic3 $e/t.

    %. *eview the summar5 pa$e to ensure that 5ou have chosen an appropriate networ3

    confi$uration, and then clic3 !inish. &lic3 Close.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    25/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 24

    #te& .: Create the Co(&ute Node I(age

    #ou can now create a compute node ima$e. -his is the compute node ima$e that will becaptured and deplo5ed to each of the compute nodes. -here are three tas3s that are re+uiredto create the compute node ima$e:

    . !nstall and confi$ure the base operatin$ s5stem.

    2. !nstall and confi$ure the 1 a$ent and &ompute &luster Pac3.

    . 8pdate the ima$e and prepare it for deplo5ment.

    To install and configure the base operating system

    . tart the node that 5ou want to use to create 5our compute node ima$e. !nsert the

    6icrosoft Windows erver 200 &ompute &luster =dition &1 into the &1 drive. -e9t;

    mode setup launches automaticall5.

    4. ccept the license a$reement.

    %. )n the Partition )istscreen, create one partition of % E". elect the % E" partition as

    the install partition, and then press =-=*.

    7. )n the !ormat Partitionscreen, accept the default of -F, and then press =-=*.

    Proceed with the remainder of the te9t;mode setup. -he computer then reboots into

    $raphical setup mode.

    (. )n the )icensing 'odespa$e, select the option for which 5ou are licensed, and then

    confi$ure the number of concurrent connections, if needed. &lic3 $e/t.

    . )n the Computer $ame and %dministrator Passwordpa$e, t5pe a name for the

    compute node that has not been added to 1 @for e9ample, )1=000A. -5pe 5our local

    administrator password twice, and then press =-=*.

    0. )n the $etworking Settingspa$e, select Typical settings,and then clic3 $e/t. -his will

    automaticall5 assi$n addresses to 5our public and private adapters. -he adapter

    information for the deplo5ed nodes will be automaticall5 created when the ima$e is

    deplo5ed to a node.

    . )n the 5orkgroup or Computer 2omainpa$e, select@es, make this computer a

    member of a domain. -5pe the name of 5our cluster domain @for e9ample, 'P&&lusterA,

    and then clic3 $e/t. When prompted, t5pe the name and the password for an account

    that has permission to add computers to the domain @for e9ample,

    hpcclusterOadministratorA, and then clic3 >6. -he computer will cop5 files, and then

    reboot. ote: !f 5our networ3 adapter drivers are not included on the Windows erver

    200 &ompute &luster =dition &1, then 5ou will not be able to ?oin a domain at this time.

    !nstead, ma3e the computer a member of a wor3$roup, complete the rest of setup, install

    5our networ3 adapters, and then ?oin 5our compute node to the domain.2. o$ on to the node as administrator.

    . &op5 the IF= files to 5our compute node. *un each e9ecutable and follow the

    instructions for installin$ the +uic3 fi9 files on 5our server.

    . )pen *e$edit. &lic3 Start, clic3 ?un, t5pe re$edit, and then clic3 >6.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    26/39

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    27/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 27

    . *epeat the previous step for all uninstalled components. When all of the re+uired

    components have been installed, clic3 $e/t. When the 6icrosoft &ompute &luster Pac3

    completes, clic3 !inish.

    When 5ou have installed and confi$ured the 1 $ent and &ompute &luster pac3, 5ou can

    update 5our ima$e with the latest service pac3s, and then prepare 5our ima$e for deplo5ment.

    To update the image and prepare it for deployment

    . *un the Windows 8pdate service on 5our compute node. !f 5our cluster lies behind a

    pro95 server, confi$ure !nternet =9plorer with 5our pro95 server settin$s. For information

    on how to do this, see Step #& Install and Configure the Ser"ice $ode, earlier in this

    $uide.

    2. *un the 1is3 &leanup utilit5. &lic3 Start, clic3 %ll Programs, clic3 %ccessories, clic3

    System Tools, and then clic3 2isk Cleanup. elect the &: drive, and then clic3 >6.

    elect all of the chec3 bo9es, and then clic3 >6. When the cleanup utilit5 is finished,

    close the utilit5.

    . *un the 1is3 1efra$menter utilit5. &lic3 Start, clic3 %ll Programs, clic3 %ccessories,

    clic3 System Tools, and then clic3 2isk 2efragmenter. elect the &: drive, and then

    clic3 2efragment. When the defra$mentation utilit5 is finished, close the utilit5.

    #te& /: Ca&ture and ,e&lo% I(age to Co(&ute Nodes

    #ou can now capture the compute node ima$e that 5ou ?ust created. #ou can then deplo5 theima$e to compute nodes on 5our cluster.

    To capture the compute node image

    . !f the compute node is not runnin$, turn on the computer and wait for the node to boot into

    Windows erver 200 &ompute &luster =dition.

    2. o$ on to the service node as administrator. &lic3 Start,and then clic3 %2S

    'anagement. *i$ht;clic3 2e"ices, and then clic3 %dd 2e"ice.

    . !n the %dd 2e"icedialo$ bo9, t5pe a name in the $amete9t bo9 @for e9ample, ode000A,

    a description for 5our node @for e9ample, &ompute ode !ma$eA, and then t5pe the 6&

    address for the node that is runnin$ the compute node ima$e. &lic3 >6. -he status pane

    will indicate that the node was created successfull5. &lic3 Cancelto close the dialo$ bo9.

    . *i$ht;clic3 5our compute node name. &lic3 Properties, and then clic3 the =ser ariables

    tab.

    4. &lic3 %dd. !n the ariablesdialo$ bo9, in the $amete9t bo9, t5pe !ma$ename. !n the

    aluete9t bo9, t5pe a name for 5our ima$e @for e9ample, &&!ma$eA. &lic3 >6twice.

    %. *i$ht;clic3 the compute node device a$ain, and then clic3 Properties. !n the 5inP.

    repository namete9t bo9, t5pe the name for 5our repositor5 that 5ou defined when 5ou

    installed 1 @for e9ample, ode!ma$esA. &lic3 %pply,and then clic3 >6.

    7. *i$ht;clic3 the compute node that 5ou ?ust added, and then clic3 Take Control.

    (. *i$ht;clic3 the compute node device a$ain, and then clic3 ?un Fob. -he ?un ob 5iAard

    starts. &lic3 $e/t.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    28/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2(

    . )n the ob Typepa$e, select =se an e/isting Fob template,and then clic3 $e/t.

    0. )n the Template Selectionpa$e, select Capture Compute $ode. &lic3 $e/t.

    . )n the Completing the ?un ob 5iAardpa$e, clic3 !inish. Created obsdialo$ bo9

    appears. &lic3 >6. -he 1 $ent on 5our compute node runs the ?ob, usin$ 5sprep to

    prepare and confi$ure the node ima$e, and then usin$ the 1 ima$e capture functions

    to create and cop5 the ima$e to 1. When the ima$e capture is complete, the node

    boots into WinP=.

    2eploy the image to nodes on the cluster3When 5ou have captured the compute nodeima$e to the service node, 5ou can deplo5 the ima$e to compute nodes on the cluster.

    To deploy the image to nodes on the cluster

    . o$ on to the service node as administrator. &lic3 Start, clic3 %ll Programs,clic3

    'icrosoft %2S,and then clic3 %2S 'anagement.

    2. =9pand the %utomated 2eployment Ser"icesnode,andthen select 2e"ices.

    . elect all devices that appear in the ri$ht pane, ri$ht;clic3 on the selected devices,

    and then select Take Control. -he &ontrol tatus chan$es to@es.

    . *i$ht;clic3 on the devices, and then clic3 ?un Fob.

    4. -he ?un ob 5iAardappears. &lic3 $e/t.

    %. )n the ob Typepa$e, select =se an e/isting Fob template.&lic3 $e/t.

    7. )n the Template Selection pa$e, select boot-to-winpe. &lic3 $e/t.

    (. )n the Completing the ?un ob 5iAard pa$e, clic3 !inish.

    . "oot the computer nodes. -he networ3 adapters should alread5 be confi$ured to use

    PH= and obtain the WinP= ima$e from the service node. -o avoid overwhelmin$ the

    1 server durin$ unicast deplo5ment of WinP= ima$e, it is recommended that 5ou

    boot onl5 four nodes at a time. ubse+uent sets of four nodes should be booted uponl5 after all of the previous sets of four nodes are showin$ Connected to 5inP.

    status in the 1 6ana$ement window on the head node.

    0. fter all the nodes are connected to WinP=, 5ou can deplo5 the compute node ima$e

    to those nodes. *i$ht;clic3 the devices, and then clic3 ?un Fob.

    . -he ?un ob 5iAardappears. &lic3 $e/t.

    2. )n the ob Typepa$e, select =se an e/isting Fob template.&lic3 $e/t.

    . )n the Template Selection pa$e, select 2eploy CCS Image. &lic3 $e/t.

    . )n the Completing the ?un ob 5iAard pa$e, clic3 !inish.-he nodes automaticall5

    download and run the ima$e. -his tas3 will ta3e a si$nificant amount of time,

    especiall5 when 5ou are installin$ hundreds of nodes. 1ependin$ on 5our availablestaff, 5ou ma5 want to run this as an overni$ht tas3. When finished, 5our nodes are

    ?oined to the domain and read5 to be mana$ed b5 the head node.

    #te& 0: Configure and anage the Cluster

    -he head node is used to mana$e and maintain 5our cluster once the node ima$es havebeen deplo5ed. -he &ompute &luster Pac3 includes a &ompute &luster dministrator consolethat simplifies mana$ement tas3s, includin$ approvin$ nodes on the cluster and addin$ users

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    29/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2

    and administrators to the cluster. -he console includes a -o 1o ist that shows 5ou whichtas3s have been completed. Follow these steps to confi$ure and mana$e 5our cluster:

    . 1isable Windows Firewall on all nodes in the cluster.

    2. pprove nodes that have ?oined the cluster.

    . dd users and administrators to the cluster.

    2isable 5indows !irewall on all nodes on the cluster3 -he &ompute &luster dministratorconsole enables 5ou to define how the firewall is confi$ured on all cluster node networ3adapters. For best performance on lar$e;scale deplo5ments, it is recommended that 5oudisable Windows Firewall on all interfaces.

    To disable 5indows !irewall on all nodes on the cluster

    . &lic3 Start, clic3 %llPrograms, clic3 'icrosoft Compute Cluster Pack, and then clic3

    Compute Cluster %dministrator.

    2. &lic3 the To 2o )ist. !n the $etworkingsection in the results pane, clic3 'anage

    5indows !irewall Settings. -he 'anage 5indows !irewall 5iAardstarts. &lic3 $e/t.

    . )n the Configure !irewallpa$e, select 2isable 5indows !irewall, and then clic3 $e/t.

    . )n the iew Summarypa$e, clic3 !inish. )n the ?esultpa$e, clic3 Close. When

    compute nodes are approved to ?oin the cluster, the firewall will be disabled.

    %ppro"e nodes that ha"e Foined the cluster3 When 5ou deplo5 &ompute &luster =ditionnodes, the5 have ?oined the cluster but have not been approved to participate or process an5

    ?obs. #ou must approve them before the5 can receive and process ?obs from 5our users.

    To appro"e nodes that ha"e Foined the cluster

    . )pen the &ompute &luster dministrator console. &lic3 $ode 'anagement.

    2. !n the results pane, select one or more nodes that displa5 a status of Pendin$ pproval.

    . !n the tas3 pane, clic3 %ppro"e. #ou can also ri$ht;clic3 the selected nodes and then clic3

    %ppro"e.

    . When the nodes are approved, the status chan$es to Paused. #ou can leave the nodes in

    Paused status, or in the tas3 pane 5ou can clic3 *esume to enable the node to receive

    ?obs from 5our users.

    %dd users and administrators to your cluster3!n order to use and maintain the cluster, 5oumust add cluster users and administrators to 5our cluster domain. -his will ma3e it possiblefor others to submit ?obs to the cluster, and to perform routine administration and maintenanceon the cluster. !f 5our or$ani>ation uses ctive 1irector5, 5ou will need to create a trust

    relationship between 5our cluster domain and other domains in 5our or$ani>ation. #ou willalso need to create or$ani>ational units @)8sA in 5our cluster domain that will act ascontainers for other )8s or users from 5our or$ani>ation. #ou ma5 need to wor3 with other$roups in 5our compan5 to create the necessar5 securit5 $roups so that 5ou can add usersfrom other domains to 5our compute cluster domain. "ecause each or$ani>ation is uni+ue, itis not possible to provide step;b5;step instructions on how to add users and administrators tothe cluster domain. For help and information on how best to add users and administrators to5our cluster, see Windows erver 'elp.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    30/39

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    31/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    . )n the Select Installation Typepa$e, select Installonly the 'icrosoft ComputeCluster Pack Client =tilities for the cluster users and administrators,and thenclic3 $e/t.

    . )n the Select Installation )ocation pa$e, accept the default location, and then clic3$e/t.

    4. )n the Install ?euired Components pa$e, hi$hli$ht an5 components that are notinstalled, and then clic3 Install.

    %. When the installation is finished, a window appears that sa5s 'icrosoft ComputeCluster Pack has been successfully installed.&lic3 !inish.

    Please note that for an administration console, 5ou should install onl5 the client utilities. For adevelopment wor3station, 5ou should install both the software development 3it @1DA utilitiesand the client utilities.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    32/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 2

    A&&endi4 A: Tuning %our Cluster

    =ach cluster is created with a different $oal in mindG therefore, there is a different wa5 to tuneeach cluster for optimal performance. 'owever, some basic $uidelines can be established. -oachieve performance improvements, 5ou can do some plannin$, but testin$ will also be

    crucial. For testin$, it is important to use applications and data that are as close as possible tothe ones that the cluster will ultimatel5 use. !n addition to the specific use of the cluster, itspro?ected si>e will be another basis for ma3in$ decisions. fter 5ou deplo5 the applications,5ou can wor3 on tunin$ the cluster appropriatel5.

    -he best networ3in$ solution will depend on the nature of 5our application. lthou$h there areman5 different t5pes of applications, the5 can be broadl5 cate$ori>ed as messa$e;intensiveand embarrassin$l5 parallel. !n messa$e;intensive applications, each nodeCs ?ob is dependenton other nodes. !n some situations, data is passed between nodes in man5 small messa$es,meanin$ that latenc5 is the limitin$ factor. With latenc5;sensitive applications, hi$h;performance networ3in$ interfaces, such as Winsoc3 1irect, are crucial. !n addition, the use ofhi$h;+ualit5 routers and switches can improve performance with these applications.

    !n some messa$in$ situations, lar$e messa$es are passed infre+uentl5, meanin$ that

    bandwidth is the limitin$ factor. specialt5 networ3, such as !nfini"and or 65rinet, will meetthese hi$h;bandwidth re+uirements. !f networ3 latenc5 is not an issue, a $i$abit =thernetnetwor3 adapter mi$ht be the best choice.

    !n embarrassin$l5 parallel applications, each node processes data independentl5 with littlemessa$e passin$. !n this case, the total number of nodes and the efficienc5 of each node isthe limitin$ factor. !t is important to be able to fit the entire dataset into *6. -his will result inmuch faster performance, as the data will not have to be pa$ed in and out from the dis3durin$ processin$. -he speed of the processors and the t5pe and number of nodes is a primeconcern. !f the processors are dual;core or +uad;core, this ma5 not be as efficient as havin$separate processors, each with their own memor5 bus. !n addition, if h5per;threadin$ isavailable, it ma5 be advanta$eous to turn this feature off. '5per;threadin$ is used whenapplications are not usin$ all &P8 c5cles, so we have them run on a sin$le processor.'5perthreadin$ is $enerall5 bad for hi$h;performance computin$ applications, but not

    necessaril5 all of them. o lon$ as the operatin$ s5stem 3ernel is h5perthread;aware, thefloatin$ point intensive processes will be balanced across ph5sical cores. For multi;threadedapplications that ma5 have !/) intensive threads and floatin$ point intensive threads,h5perthreadin$ could be a benefit. '5perthreadin$ was disabled at & because none ofthe applications were floatin$;point intensive, and no specific thread;balancin$ or 3ernel;tunin$ was performed. -his wor3s for re$ular scenarios, but in hi$h;performance computin$,all &P8 resources are used, so havin$ all processes on a sin$le processor has the oppositeeffects: the5 have to wait to $et resources. !f the application were actuall5 perfectl5 parallel,each e9tra node would increase performance time linearl5.

    For each application, there are a ma9imum number of processors that will increaseperformance. bove that number, each processor adds no value, and could even decreaseperformance. -his is referred to as application scalin$. 1ependin$ on the s5stem architecture,all cores sometimes divide available bandwidth to memor5, and the5 certainl5 alwa5s dividethe networ3 bandwidth. )ne of these three @&P8, networ3, or memor5 bandwidthA is theperformance bottlenec3 with an5 application. !f the nature of the application@sA is 3nown, 5oucan determine in advance the optimal cluster specifications that will match the application.#ou should wor3 with 5our application vendor to ensure that 5ou have the optimal number ofprocessors.

    !n some applications, man5 ?obs are run, each of short duration. With this scenario, theperformance of the ?ob scheduler is crucial. -he && ?ob scheduler was desi$ned to handlethis situation.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    33/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    When evaluatin$ cluster performance, it is important to be aware that benchmar3s donCtalwa5s tell the whole stor5. #ou must evaluate the performance based on 5our own needsand e9pectations. =valuation should ta3e place b5 usin$ the application alon$ with the datathat will be runnin$ on the cluster. -his will help to ensure a more accurate evaluation that willresult in a s5stem that better meets 5our needs.

    For more information on cluster tunin$, 5ou can download the Performance -unin$ a

    &ompute &lusterK white paper from the 6icrosoft Web site at http://$o.microsoft.com/fwlin3/Lin3!dM(7(2(

    #ou can also find additional tips and new information on performance tunin$ at the 'P&-eam blo$: http://windowshpc.net/blo$s.

    -able 4 deals with scalabilit5 and will help 5ou ma3e decisions based on the intended si>e of5our cluster. -he first part focuses on mana$ement scenarios, while the second part focuseson networ3in$ scenarios. For each scenario, there are an estimated number of nodes, abovewhich the scenario will manifest itself. !f 5our cluster e9ceeds the specified number of nodes,5ou ma5 need to use the ote column to plan accordin$l5, or to troubleshoot.

    Table -: #calabilit% Considerations

    'anagement Scenario $odes $ote'S2. on Head $odesupports ( or fewerconcurrent connections.

    %S 8se I erver 2004 on the head node @hard codedA

    4;7 tables for scheduler. 8se ( tables for 16.

    ?IS on Ser"ice $odesupports onl5 (0 machinessimultaneousl5.

    %S 8se 1 for && 200 @1 re+uires 2;bitA.

    ICS;$%Thas an addressran$e limit of 2.%(.0.T

    240S 8se 1'&P erver instead.

    -he!ile ser"er on Head$odeonl5 supports a limitednumber of simultaneousconnections to 6"/-F.

    2 =9ecutable on compute nodes

    !ncrease the number of connections that the file serveron the head node can support @see D" 72A

    -he 2C;2$S ser"er on Head$odeis not optimal. !t doesnCthandle well with several !&s.

    %S !t is best to levera$e corporate !- 1&.

    Put 1&/1 on a separate mana$ement node.

    %2Sloses contact withcompute nodes afterWinsoc31irect has beenenabled.

    / 8se clusrun or ?obs to control the machine. !f !P6! isavailable, use !P6! to reboot the machine into winP=.

    W1 for ne9t version of && wor3s withWinsoc31irect.

    Cisco I switch subnetmanageris incompatible withopen!" drivers.

    / 8se open6:

    1isable &isco !" switch subnet mana$er

    =nable open6

    S2'updatebottlenec3e9ists.

    %S && B P

    ob Schedulingbottlenec3se9ist.

    %S && B P

    5insock2irect@lar$e scaleonl5A

    %S && B P

    Winsoc3 1irect hotfi9es 0( , 27%20, 22(%

    Infiniband dri"ers@lar$escale onl5A

    %S -his is fi9ed in openFabrics build 4, found at:

    http://windows.openib.or$/downloads/binaries/

    http://go.microsoft.com/fwlink/?LinkId=87828http://go.microsoft.com/fwlink/?LinkId=87828http://windowshpc.net/blogshttp://windowshpc.net/blogshttp://go.microsoft.com/fwlink/?LinkId=87828http://go.microsoft.com/fwlink/?LinkId=87828http://windowshpc.net/blogs
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    34/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster

    'anagement Scenario $odes $ote

    -here is a bottlenec3 in thenumber of possiblesimultaneous connectionswith code path used whenS@$ attack protectionis on.

    %S 1isable # attac3 protection re$istr5 value

    'D6O5stemO&urrent&ontroletOervicesO-cpipOParameters

    5nttac3Protect M0

    -here are TCPtimeouts oncallin$ nodes when networ3 is

    ?ammed @dela5 at switchA. Fore9ample, mpi all reduce.

    %S et -&P retransmission count to 0920. Please notethat this is hard to dia$nose as one;to;all ma3esdifferent nodes fail.

    atenc5 is too hi$h. / 8se mpie9ec Uenv !"W1RP) 400 linpac3.

    "andwidth is too low. / 8se mpie9ec Uenv6P!&'R)&D=-R"8FF=*R!V= 0 to avoid cop5 onsend to improve bandwidth. )nl5 use this whenWinsoc3 1irect is enabled U it can cause loc3up withEi$= and !Po!".

    5insock 2irectconnectiontimeout e9ists.

    / 8se mpie9ec Uenv !"W1RR-!6=)8- 000 to setthe subnet mana$er timeout to a hi$her value durin$Winsoc3 1irect connection establishment.

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    35/39

  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    36/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster %

    Issue 'itigation 2etails

    etwor3connectivit5 failure

    !dentif5 node with defect

    8se Pallas pin$ pon$, one;to;all,all;to;all

    $ood set of tools for this are the inu9;based !ntel 6P! benchmar3s @based onthe Pallas test suiteA. -hese are availablefor download fromhttp://windowshpc.net/files//portin$Runi9

    Rcode/entr57.asp9$ote&"ecause these tests are inu9;based, 5ou will have to port them to &&usin$ the ubs5stem for 8!H

    pplications @8A. !nstructions how todo this are included with the download.

    Winsoc3 1irectissues

    1isable Winsoc3 1irect @W1Aand use the !Po!" path instead of*16:

    Clusrun /all

    \\HE"')'E\01'river0nstallat,\n

    et\amd23\installsp -r

    !f it wor3s when disabled, then tr5to N*epairC !" connections clusrunnetsh interface set interfacenameM6P!adminM1!"=/="=

    Balidate that 5our cluster has thelatest Winsoc3 1irect patches

    !" driver and Winsoc3 1irect installationutilit5

    %pplication Performance $ot >ptimal

    pplication notoptimi>ed formemor5 or &P8utili>ation

    &hec3 whether nodes are pa$in$instead of usin$ *6

    &hec3 &P8 utili>ation

    8se perfmon counters

    http://$o.microsoft.com/fwlin3/Lin3!dM(%%

    pplication doesnCt

    scale to lar$enumber of nodes

    1ecrease the number of nodes

    used b5 the application untilapplication performance comesbac3 to e9pected level

    66P! does notbalance wellbetween samenode processorscommunicationand node to nodecommunication

    =9periment with disablin$ theshared memor5 settin$ and seewhether application performanceimproves

    =speciall5 relevant for messa$e;intensive applications

    6P!&'R1!"=R'6

    6essa$es are notcomin$ in fastenou$h

    Ei$=: =9periment with turnin$ offthe interrupt modulation to free up&P8 usa$e

    !": =9periment with increasin$

    pollin$ of messa$es.Pollin$ causes hi$h &P8 usa$e,so if usa$e is too hi$h, it will bedetrimental to the applicationcomputin$ &P8 needs.

    open!" driver !"W1RP)

    http://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://go.microsoft.com/fwlink/?LinkId=86619http://go.microsoft.com/fwlink/?LinkId=86619http://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://windowshpc.net/files/4/porting_unix_code/entry373.aspxhttp://go.microsoft.com/fwlink/?LinkId=86619http://go.microsoft.com/fwlink/?LinkId=86619
  • 8/13/2019 Compute Compute Cluster Deployment GuideCluster Deployment Guide

    37/39

    tep;b5;tep Euide to !nstall in$, &onfi$urin$, and -unin$ a 'i$h;Performance &ompute cluster 7

    Issue 'itigation 2etails

    &onnectivit5 toone or more nodeson the cluster islost

    1ivide the cluster into subsets ofnodes.

    *un Pallas pin$ pon$ or Pallasone;to;all or Pallas all;to;all onthose subsets.

    !ntel 6P! benchmar3s.

    -his strate$5 brea3s the cluster intosubclusters to tr5 to find where the issueis. !n each sublcluster run sanit5 testsli3e the Pallas series in order to discover

    which subcluster contains the badKnode.

    witchesoversubscriptionnot optimal

    -r5 a hi$her number of uplin3s -his strate$5 involves chec3in$ thenumber of uplin3s/downlin3s per switchto chec3 to see this is the cause of poorapplication performance.

    end operation =9periment with havin$ no e9tracop5 on the end operation

    66P! settin$

    et 6P!&'R)&D=-R"8FF=*R!V=to 0

    $ote&-his is done on the command linewith the command:

    mpiexec env VARIABLE SETTING-env

    OTERVARIABLE OTERSETTING$ote&-his will lead to hi$her bandwidthbut also to hi$her &P8 utili>ation.

    $ote&8se this onl5 when compute nodesare fitted with a W1;enabled driver.8sin$ a settin$ of 0 will cause thecompute nodes on non;W1 networ3s tostop respondin$.

    6emor5 busbottlenec3

    =9periment with settin$ theprocessor affinit5 @assi$n an 6P!process to a specific &P8 or &P8coreA

    n e9ample of doin$ this from thecommand line:

    4o5 su5mit /numprocessors6.7

    mpiexec /cmd /c set"++init!85at

    m!app8exe

    where set%ffinity3batconsists of:

    9ec,o o++

    set /a "::00TY;?M0&$"K?

    ?? ?@M1E$&):&$)CESS)$S?A