Ss Sys Admin

Embed Size (px)

Citation preview

  • 8/11/2019 Ss Sys Admin

    1/59

    Known ProblemsOther DocsOther Docs

    SeisSpace SystemAdministration

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    2/59

    2 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Configuring SeisSpace

    Configuring some General properties

    SeisSpace has a master configuration file similar to the ProMAXconfig_file where an administrator can set certain properties for the siteinstallation.

    The master configuration file for SeisSpace is thePROWESS_HOME/etc/prowess.propertiesfile.

    The administrator may want to edit this file to set some installation defaultsand them make it write only by that administrative user.

    It may also be useful to move the PROWESS_HOME/etc directory to theinstall directory /apps/SSetc similar to how you would copy thePROMAX_HOME/etc directory out of the install so that yourconfiguration settings do not get deleted if you were to reinstall theproduct.

    You can point to the external etc directory using the environment variablePROWESS_ETC_HOME set in the client startup environment or script.See PROWESS_HOME/etc/SSclient for an example.

    Product lists

    The administrator can set up the list of Products that are available. The listincludes ProMAX 2D, ProMAX 3D, ProMAX 4D, ProMAX VSP,ProMAX Field, ProMAX DEV and DepthCharge. There is a stanza in the/apps/SeisSpace/etc/prowess.properties file that you can use to control thelist of available products that is presented to the users.

    # Define the comma-separated list of products that areavailable from

    # the Navigator. Whether the user is actually able to switch

    to a product

    # depends upon whether a license for it is available. If the

    product name is

    # preceded by the negation symbol (!), then that product will

    not be

    # shown in the Navigator. You may use the Navigator

    preferences to

    # change the displayed products on a per-user basis.

    ## ALL - all product levels

    # 2D, 3D, 4D, VSP, FIELD, DEPTHCHARGE, DEV

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    3/59

    3 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    #

    # The default is to show all products, except for ProMAX Dev

    #

    Navigator.availableProductLevels=ALL,!DEV

    The default is to show all products except to make ProMAX Dev not

    visible in the Product pull down list in the Navigator.

    Saving Settings

    There is the concept of "shared" information that is the same for all usersand managed by an administrator.

    There is also the concept of "private" information that is only available toan individual user.

    "Shared" information is stored in the /apps/logs/netdir.xmlfile and"private" information is stored by default in the users/home/user/SeisSpace/.seisspacefile, or you can specify a differentdirectory to store the .seisspace file in the SSclient startup script with thePROWESS_PREFS_DIR environment variable.

    There is a stanza in the /apps/SeisSpace/etc/prowess.propertiesfile thatyou can use to control how much "private" information the users can have.

    ################ ADMINISTRATION ################

    # These options allow a system administrator to restrict

    access

    # of non-administrative users to administrative features.

    When set

    # to true, an administrative feature can only be used if a

    user

    # has logged in as admin. Note that these options are here in

    # response to a customer request.

    onlyAdminCanAddArchiveDataHomes=false

    onlyAdminCanAddDataHomes=false

    onlyAdminCanEditHostsList=false

    onlyAdminCanDefineClusters=false

    onlyAdminCanInitializeDatabase=true

    If the administrator wants to restrict the users ability to add their own datahomes or hosts or cluster lists, these can be set to true. The options willthen be grayed out of the users pull down menu and rendered inoperative.NOTE that the administrator will also need to restrict write access to thisfile. The sitemanager and all clients will need to be restarted for a change tothis file to take effect.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    4/59

  • 8/11/2019 Ss Sys Admin

    5/59

    5 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Job Submission - RMI port count - for more than 1000 simultaneous jobs

    A property is available in the PROWESS_HOME/etc/prowess.propertiesfile to increase the number of RMI ports to look for if you plan to run morethan 1000 jobs simultaneously.

    # Number of successive ports when creating an RMI registryfor communicating

    # between the Exec and SuperExec. Default is 1000. The

    minimum is 1000.

    # Increase if you are running more than 1000 jobs

    simultaneously.

    #com.lgc.prowess.exec.NumberRegistryPorts=1000

    Changing the default in JavaSeis Data Output to use primary storage

    The default in the JavaSeis Data Output menu is to use secondary storagefor the trace data. You must use the parameter default flow method to setthis default. This method is documented in the SeisSpace User Guide -Working with the navigator section.

    Changing the default location to store the .user_headers files for user defined header listsand setting options for user header hierarchy.

    You can set the default location of where the.user_definedheaders file is

    stored to be either of Data_Home Area, or Line. The installation default isData_Home working under the thinking that in general as users addheaders, they will tend to be the same ones use multiple times for all lines.You may select to set the default storage to be at the LINE level instead.

    # The default location of the .user_headers file. The

    .user_headers

    # files contains user-defined headers. Set to

    "DataHome","Area", or "Line" (case insensitive).

    # Default is DataHome.

    # TODO this should be set from the user preferences [dialog],

    and whether it# can be changed should be controlled by another property:

    onlyAdminCanChangeDefaultUserHeaderLocation.

    com.lgc.prodesk.navigator.defaultUserHeaderLocation=DataHome

    You can also set a property to prevent users from having the option to selectan alternate location.

    # Switch to determine if users can store the user headers at

    a location

    # other than the above default location. The default is to

    allow.# If you are logged in as Admin, this will always be true.

    com.lgc.prodesk.navigator.canChangeUserHeaderLocation=true

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    6/59

  • 8/11/2019 Ss Sys Admin

    7/59

    7 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Logging in as Administrator

    1. Select Edit>Administration>Login As Administrator.

    2. ClickSet Password.

    3. Leave the Old Passwordline blank and enter your new password

    twice. Then clickOK.

    All users will now need to use the new password to gain administrative

    privileges.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    8/59

    8 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Defining Hosts

    There are two possible hosts lists, theshared host listset up by theadministrator and thepersonal host list set up by the user.

    The Hosts list is the list of the machines on your network that can be usedto run remote jobs and define clusters for parallel processing.

    If you define your hosts lists when you are logged on as Administrator, youwill define hosts for all of the users (A shared hosts list). Otherwise, youwill be defining a personal, or "private" list of hosts.

    "Shared" information is stored in the /apps/logs/netdir.xml file and"private" information is stored in the users/home/user/SeisSpace/.seisspace file. (orPROWESS_PREFS_DIR/.seisspace file)

    To begin, select Edit > Administrator > Define Hosts. One of thefollowing dialog boxes will appear depending on if you are logged in as theadministrator or not:

    Administrator Shared host list dialog

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    9/59

    9 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    User personal host list dialog:

    Enter the name of the host youd like to add into the large text box. If youd

    like to add a range of hosts that differ by a number prepended or appendedname (for example: xyz1, xyz2, xzy3, xyz4, xyz5, etc.), enter the startinghost name in the Generate hosts fromtext field (xyz1) and the endinghostname in the Generate hosts tofield (xyz5). When you clickAdd, allthe host names within the range will be generated and added.

    You can also define hosts with a number embedded in the name. Forexample: x1yz to x5yz.

    Remove hosts by deleting their names from the editable list.

    ClickSave and Closeto update your hosts list or Cancelto exit withoutsaving. The /apps/logs/netdir.xml file for shared lists or the usershomedir/SeisSpace/.seisspace file will be updated for a private host list.This is also the list of hosts that will be shown in the job submit userinterface

    Defining Clusters

    There are two possible cluster lists, theshared cluster listset up by theadministrator and thepersonal cluster list set up by the user.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    10/59

    10 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    A cluster is a logical name for a group of hosts to which you can distributedsubmit jobs. If you define the clusters when you are logged on asAdministrator, you will define cluster definitions for all of the users.Otherwise, you will be defining a personal, or "private" cluster definition.

    "Shared" information is stored in the /apps/logs/netdir.xml file and

    "private" information is stored in the users/home/user/SeisSpace/.seisspace file. (orPROWESS_PREFS_DIR/.seisspace file)

    Duplicate names are not managed by SeisSpace. The cluster list to choosefrom in the job submit user interface is a concatenation of the shared andthe personal lists. The shared cluster definitions are indicated as shared bythe check box.

    To begin, select Edit > Administrator > Define Clusters. One of the

    following dialog boxes will appear depending on if you are logged in as theadministrator or not:

    The general steps for adding a cluster are:

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    11/59

    11 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    1. Enter the cluster name in the New:text box.

    2. ClickAdd.

    3. Enter starting and ending hosts information and clickAddto generate

    the list of hosts.

    4. ClickSave.

    Below is an example after defining clusters for a shared clusters list:

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    12/59

    12 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Below is an example after defining clusters for a personal clusters list :

    Enter the name of the cluster youd like to add in the Newtext field and

    clickAdd. It will be added to the pulldown lists of clusters.Then create a list of hosts for the cluster by editing directly into the largetext box. If youd like to add a range of hosts that differ by a numberprepended or appended name (for example: xyz1, xyz2, xzy3, xyz4, xyz5,etc.), enter the starting host name in the Generate hosts fromtext field(xyz1) and the ending hostname in the Generate hosts tofield (xyz5).When you clickAdd, all the hosts names within the range will be added.

    You can also define hosts with a number embedded in the name. Forexample: x1yz to x5yz.

    Remove hosts by deleting their names from the editable list.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    13/59

    13 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    To edit or remove an existing cluster, select it from the pulldown list ofclusters.

    ClickSave and Closeto update your hosts list or Cancelto exit withoutsaving.

    Adding a Data Home

    A Data Home directory is the equivalent of ProMAX Primary Storagedirectory.

    There are two possible Data_home lists, theshared listset up by theadministrator with details stored in the netdir.xml file and thepersonal listset up by the user with details stored in the users.seisspacefile.

    CAUTION: - avoid declaring the same DATA_HOME in both the shared(logs/netdir.xml) file and your personal .seisspace files. A DATA_HOMEshould only be specified in one location. If you do end up with duplicates,you will be prompted with some options of how to resolve the duplication.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    14/59

    14 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    1. Begin by selecting Edit>Administration>Add Data Home.

    The Add new Data Home dialog box appears.

    Enter or select a pathname to the project you are adding. This path must

    be accessible from all nodes in the cluster by exactly the same

    pathname.

    The pathname is equivalent to ProMAX primary storage where the

    project/subproject hierarchy that is shown is equivalent to the ProMAX

    Area/Line hierarchy in PROMAX_DATA_HOME.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    15/59

    15 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    2. If you wish, enter a name which SeisSpace will use to label the Data

    Home. The idea here is that you may want to address a project by a

    logical name instead of by a directory name. Your actual disk directory

    may have a name similar to

    /filesystem/disk/primarydir1/secondarydir2/group1/person2/promax_d

    ata/marine_data. It may be easier to address this data as simply "marine

    data" from the navigator. (MB3 > Propertiescan be used to show the

    entire path to the aliased name for reference.)

    3. If you wish, enter a character string that will be used as an additional

    directory prefix for JavaSeis datasets in secondary storage. DO NOT

    use blanks, or slashes or other special characters, If JavaSeis Secondary

    storage is specified as /a/b/c, the datasets for this data_home will use

    directory /a/b/c/this_prefix/area/line/dataset. If you leave this entry

    blank, the datasets for this data_home will use directory

    /a/b/c/area/line/dataset. This feature is designed to prevent potential

    dataset overwriting in the case where you have the same area/line in

    multiple data_homes using the same secondary storage directories.

    4. ProMAX Environment Variable Editor.

    At a minimum it is recommended that you specify values for

    PROMAX_SCRATCH_HOME and PROMAX_ETC_HOME. Select

    the variable and then clickEditto modify the settings.

    You may add other variables here. Typical entries may be

    PROMAX_MAP_COMPRESSION, or extended scratch partitions.You can consult the ProMAX system administration guide for the list

    of environment variables.

    It is generally recommended to avoid having the same data home specifiedas a shared project and have users specify it as a personal project. It ispossible to do this but you will get into situations where the projects are notupdated concurrently and you will get confused. There are also somedialogs in place to help resolve the duplicate entries.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    16/59

    16 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    5. Click the checkbox forThis data home should be visible to all users

    if youd like all users to be able to access this data home. Note:this

    option is only visible if you are logged in as the administrator.

    A completed Data Home dialog should look similar to the followingexample:

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    17/59

    17 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    JavaSeis Secondary Storage

    This option is used to set up a list of file systems to use for JavaSeis datasetsecondary storage.

    If you dont do anything, JavaSeis datasets will use the same file systems as

    ProMAX datasets for secondary storage as defined in the etc/config_file. Ifthis is the desired behavior then you do not need to pursue the JavaSeisSecondary Storage configuration. You will need to make sure that youdont have a dir.dat file in any of the standard search paths. A dir.dat filewith lines with the #SS#/directory syntax will take precedence over theetc/config_file.

    When the JavaSeis Secondary Storage dialog is first started all of the textboxes may be blank. For the top text box to be populated you must have adir.dat file in one of the following possible locations with #SS# lines in it:

    PROWESS_DDF (direct path to dir.dat file)

    OW_DDF (direct path to dir.dat file)

    OW_PMPATH/dir.dat

    OWHOME/conf/dir.dat

    If you use the default for OWHOME, SeisSpace will use$PROMAX_HOME/port/OpenWorks in a standard non-OpenWorksProMAX/SeisSpace installation. If you want to specify a different locationyou can set either OW_PMPATH, OW_DDF or PROWESS_DDF in your

    SSclient startup script.

    An example of a dir.dat file can be found in your SeisSpace installation$PROWESS_HOME/etc/conf directory.

    This file is shown below for reference:

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    18/59

    18 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    # Example of lines in a dir.dat file that SeisSpace understands for specifying optional# secondary storage for JavaSeis datasets.#####################################################################################################################

    #SS#/d1/SeisSpace/js_virtual,READ_WRITE#SS#/d2/SeisSpace/js_virtual,READ_WRITE#SS#/d3/SeisSpace/js_virtual,READ_WRITE#SS#/d4/SeisSpace/js_virtual,READ_WRITE#SS#GlobalMinSpace=209715200#SS#MinRequiredFreeSpace=209715200#SS#MaxRequiredFreeSpace=107374182400############################################################################ Documentation below ################################################################################# The SeisSpace navigator will optionaly search for a file in the data_home directory# defined by the environment variable:# JAVASEIS_DOT_SECONDARY_OVERRIDE# This is a method where an administrator can change the secondary storage# specification for a DATA_HOME for testing to do things like test new disk partitions# before putting them into production without affecting the users.## In production mode , the SeisSpace navigator will first search for a .secondary file# in the data_home directory:

    ## The SeisSpace navigator will first search for a .secondary file in the data_home directory:## first: $DATA_HOME/.secondary (which is managed as part of the data_home properties)## if no .secondary file exists, the next search will be for a dir.dat file using the following hiera:## second: $PROWESS_DDF (direct path to dir.dat file)# third: $OW_DDF (direct path to dir.dat file)# fourth: $OW_PMPATH/dir.dat# fifth: $OWHOME/conf/dir.dat (Note that OWHOME=PROMAX_HOME/port/OpenWorks in

    # a standard non-OpenWorks ProMAX/SeisSpace installation)## if no dir.dat file is found, JavaSeis seconday storage will use the secondary storage definition# in the PROMAX_ETC_HOME/config_file## sixth: ProMAX secondary storage listed in the config_file for the project## In the first dir.dat file that is found, the file is checked to see if# SeisSpace secondary storage has been defined. The expected format is;###SS#/d1/SeisSpace/js_virtual,READ_WRITE

    ##SS#/d2/SeisSpace/js_virtual,READ_WRITE##SS#/d3/SeisSpace/js_virtual,READ_WRITE##SS#/d4/SeisSpace/js_virtual,READ_WRITE

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    19/59

    19 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    ## GlobalMinSpace --> A global setting used by the Random option, do not use this# folder if there is less than GlobalMinSpace available# (value specified in bytes -- 209715200 bytes = 200Mb)##SS#GlobalMinSpace=209715200#

    # In this example 4 secondary storage locations are specified all with RW status# with a global minimum disk space requirement of 200 Mb.## Other attributes can be associated with the different directories:# READ_WRITE --> available for reading existing data and writing new data# READ_ONLY --> available for reading existing data# OVERFLOW_ONLY --> avaliable as emergency backup disk space that is only used# if all file systems with READ_WRITE status are full## The data_home properties dialog can be used to make a .secondary file# at the Data_Home level which will be used first.## There are two different policies that can be used to distribute the data over the file# systems (folders) specified above: RANDOM and MIN_MAX.##PolicyRandom# Retrieve the up to date list of potential folders for secondary.## From the list of potential folders get those that have the READ_WRITE# attribute.## If the list contains more than 0, generate a random number from 1 to N# (where N=the number of folders) and return that folder index to be used.

    ## If the list of READ_WRITE folders is 0 then get the list of "OVERFLOW_ONLY" folders.# If the list contains more than 0, generate a random selection of the folder index# and return that folder index to be used.## If there are O READ_WRITE folders and 0 OVERFLOW_ONLY folders then the job will fail.##PolicyMinMax# Uses the following values:## MinRequiredFreeSpace --> Do not use this folder in the MIN_MAX policy if there is

    # less than MinRequiredFreeSpace available# (value specified in bytes -- 209715200 bytes = 200Mb)# MaxRequiredFreeSpace --> Use this folder multiple times in the MIN_MAX policy if# there is more than MaxRequiredFreeSpace available.# (value specified in bytes -- 107374182400 bytes = 100Gb)###SS#MinRequiredFreeSpace=209715200##SS#MaxRequiredFreeSpace=107374182400## Get the list of potential folders and computes the free space on each folder.#

    # for each folder in the list that has a READ_WRITE attribute check the free space.# - If the free space is less than MinRequiredFreeSpace exclude it.# (Not enough free space on this disk)

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    20/59

    20 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    # - If the free space is greater than MinRequiredFreeSpace and less than MaxRequired# add it to the list of candidates# - If the free space is also greater than MaxRequiredFreeSpace add it as a candidate again.# This will weight the allocation to disks with the most free space.## From the list of candidates use the same random number technique as above.

    ## If there are no folders in the list of candidates then check for any possible overflow folders.# If folders are found use the random number technique to return an overflow folder.# If we dont have anything in overflow we fail.

    If you use the example file located in the$PROMAX_HOME/port/OpenWorks/conf directory, and dont explicitlyset OWHOME in the startup script, this dialog would look as shown below:

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    21/59

    21 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    The top section shows the directories found in the dir.dat file. Note:theattributes (RO vs. RW) are not editable or shown here.

    The second section shows the directory and attributes contents of a.secondaryfile in this Data_Home directory if one exists.

    The bottom section shows the Min/Max disk space usage policy settings inthe .secondary file if one exists. The defaults are shown if the .secondaryfile does not exist. The defaults are 200 Mb for the mins and 100 Gb for themax. (More detail on these above in the example dir.dat file exampleshown)

    If you do nothing else, JavaSeis datasets will use all of the file systemslisted in the dir.dat for secondary storage and distribution of the file extents.

    You can also select a subset of these file systems on a "per Data Home"

    basis so that different Data Homes can use different subsets of the filesystems listed in the dir.dat file. To do this, click MB1 on the file systemsyou want to use for this Data Home in the top window and they will showup in the lower window. You can choose to add attributes to the filesystems. Multiple directories can be chosen with the standard MB1,CNTRL-MB1 and SHFT-MB1 mouse and keyboard bindings.

    Read Write:The default configuration allows for datasets to be both

    read and written.

    Read Only:Do not write any new data to the selected file system(s).

    Overflow:Only write data as an emergency backup if all of the other

    file systems are full. This is designed to be used as an

    emergency backup so that jobs dont fail when the main

    secondary disk fill up.

    Remove:Remove the selected file system(s) from the list.

    You can choose to set the min and max disk usage policy settings in the.secondary file. The policy is chosen in the JavaSeis Data Output menu.

    Min/Max Policy - Minimum (Mb)There must be at least this much disk space available

    before any extents are written to this folder.

    Min.Max Policy - Maximum (MB)

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    22/59

    22 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    If there is more disk space available than this value, this

    folder is added to the list of available folders twice.

    Min free space required for Random policy (Mb)There must be at least this much disk space available

    before any extents are written to this folder.

    Click on the Updatebutton(s) to set the configuration for this Data Home.This configuration for the Data Home is stored in two files in the DataHome directory. The .properties file will store all the properties from themain properties dialog and the .secondary file will store the list of filesystems and the min/max policies to use for JavaSeis secondary storage.

    If you delete all of the directories in the lower window and update, the.secondary file will be deleted.

    .secondary file OVERRIDE

    For testing a new filer, or secondary storage disk configuration, anadministrator may want to temporarily override the production .secondaryfile and use a test version.

    The administrator can do this by making a copy of the .secondary file in thedata_home directory and pointing to the temporary copy with an

    environment variable.In a shell, cd to the data_home directory and copy the .secondary file andmanually edit it.

    cp .secondary .secondary_test

    vi .secondary_test

    IF you have set the environment variableJAVASEIS_DOT_SECONDARY_OVERRIDE in the navigator start upscript, or your user environment, then the file it points to MUST EXIST inthe data home that you are working in. If not the IO will refer back to theoriginal .secondary or the highest level dir.dat file that it finds.

    IF the file defined as env variableJAVASEIS_DOT_SECONDARY_OVERRIDE exists

    THEN when you open the JavaSeis secondary folder configuration dialogfor a data home It will show the contents of that file and allow you to edit itby adding directories in from the dir.dat list. You cannot add in other linesmanually from the GUI.

    ELSE IF the file does not exist

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    23/59

    23 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    THEN you will be shown a blank area in the .secondary edit part of theGUI where you can repopulate it from the directories listed in the dir.datfile that have the #SS# prefix. When you update, the file will be created.

    IF the variable is NOT set, the system will use the standard .secondary filepreferentially.

    The IO code is updated so that if the variable is set it will use that file forthe secondary specification.

    In a data home you may see a .secondary plus a .secondary_test file as anexample.

    IF the .secondary_test file does not exist then the IO will use the standard.secondary even if the env variable is set to .secondary_test

    If you want to use the test file, you will need to setJAVASEIS_DOT_SECONDARY_OVERRIDE to .secondary_test

    Verifying Projects in the Navigator

    In the Navigator, click on data home folder and then navigate the tree to seeprojects (AREAS), subprojects (LINES), and Flows, Tables and Datasets.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    24/59

    24 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Removing Data Homes

    To remove a data home from your SeisSpace Navigator first select theProject folder in the tree view and then select Edit > Administration >Remove Data Home from the Navigator pulldown menu. Removing a datahome does not delete the data it only removes it from the list of data homesdefined in SeisSpace.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    25/59

    25 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Configuring the Users MPI environment

    A.mpd.conffile must exist in each users home directory.

    If one does not exist, the software will automatically generate it. If you

    want to set this up manually you can do the following:

    Create a $HOME/.mpd.conf file for each user. This file can be identical forall users, or each user can have their own "secret word". Note: This is "dot"mpd "dot"conf.

    The requirements for the .mpd.conf file are: this file exist in the users home directory,

    is owned by the user, and

    has permissions of 600 (rw for user only).

    The file can be created with the following two commands:

    $ echo "MPD_SECRETWORD=xyzzy" > $HOME/.mpd.conf

    $ chmod 600 $HOME/.mpd.conf

    after the file is created with the line of text and the permissions set, youshould see the following in the users home directory:

    [user1@a1 user1]$ ls -al .mpd.conf

    -rw------- 1 user1 users 19 Jun 23 13:38 .mpd.conf

    [user1@a1 user1]$ cat .mpd.conf

    MPD_SECRETWORD=xyzzy

    Note: There are some cases where you may have rsh problems related to amismatch between a kerberos rsh and the normal system rsh. The systemsearches to see if /usr/kerberos/bin is in your path. If the/etc/profile.d.krb5.csh does not find this in your path, the script will

    prepend it to your path. To avoid this, add /usr/kerberos/bin to the end ofyour path.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    26/59

    26 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Routing Issues

    A special routing problem can occur if a Linux clustermayor or managernode has two ethernet interfaces: one for an external address and one for aninternal cluster IP address. If the mayor's hostname corresponds to the

    external address, then the machine misidentifies itself to other clusternodes. Those nodes will try to route through the external interface.

    Quick Fix

    You can use the internal address of the mayor node as the external addressof the mayor node.

    % route add -net 146.27.172.254 netmask 255.255.255.255 gw 172.16.0.1

    where 172.16.0.1 is the internal IP address of the mayor node and146.27.172.254 is the external address of the mayor node.

    Better Fix

    Set the route on all cluster nodes to use the internal address of the mayorfor any unknown external address:

    % route add -net 0.0.0.0 netmask 0.0.0.0 gw 172.16.0.1

    This fix makes the previous fix unnecessary.

    Adding Routes

    Outside machines might not have a route to the cluster nodes. To add a

    route to a PC needing a cluster node, set the route to use the externaladdress of the mayor node to all cluster node addresses:

    % route add 172.16.0.0 mask 255.255.0.0 146.27.172.254

    where 172.16.0.0 with a mask of255.255.0.0 specifies the addressrange of the cluster nodes and 146.27.172.254 is theexternal address ofthe cluster mayor node.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    27/59

    27 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Diagnosing routing problems

    To diagnose problems with routing on a cluster, check the followinginformation on the mayor node and on a worker node. You must have directroutes to all other nodes:

    % route

    % route -n

    % netstat -r

    % netstat -rn

    Make sure your nodes IP address is associated with the ethernet interface.

    % ifconfig -a

    Hardwire the correct association of IP addresses with hostnames. Use thesame file for all nodes, including the mayor.

    % cat /etc/hosts

    See how hostnames are looked up:

    % cat /etc/nsswitch.conf

    % cat /etc/resolv.conf

    Use the lookup orderhosts: files nis dns.

    If you are not using DNS, then/etc/resolv.conf must be empty. If you areusing DNS, then the following lines must be present:

    nameserver

    search domain.com domain2.company.com

    Cluster configuration considerations

    When you get ready to set up a cluster you need to consider whatapplication components will be running on which components of thecluster. For a cluster that is meant to primarily run ProMAX and SeisSpaceyou can use the following recommendations. For other uses, you will haveto adapt these recommendations appropriately.

    The main consideration is to not overload any particular component of thecluster. For example, it is very easy to overload the Manager node with a

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    28/59

    28 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    variety of cluster administration daemons as well as a variety of userprocesses. For a ProMAX and SeisSpace installation you may want tosegregate the work as follows:

    You may decide to run the following on the Manager:

    the PBS/Torque server and scheduler

    the FlexLM license manager

    the SeisSpace sitemanager

    You may decide to use a couple of the nodes as user "login" nodes to run:

    the SeisSpace User Interface / Flow Builders

    Interactive/direct submit ProMAX, Hybrid and SeisSpace jobs

    You should only run the following on the "invisible" cluster nodes:

    PBS - mom

    ProMAX, Hybrid and SeisSpace jobs released from the queue or jobsdirected to run on those nodes.

    Additional Considerations

    In addition to the above, you will need to ensure that the manager node and

    the "login" nodes are set up with multiple IP addresses so that they arevisible on both networks. The internal cluster network and the external usernetwork.

    Running jobs on the manager should generally be avoided so that this nodecan be available to do the system management work that it is intended todo.

    You want to avoid having a PBS-mom running on the "login" node(s) toprevent jobs from the queue from running on these nodes. The "login" node

    should be reserved for interactive display jobs and small direct submit testjobs.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    29/59

    29 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Managing Batch Jobs using Queues

    Managing batch jobs for seismic data processing via queues provides thefollowing benefits:

    sequential release of serially dependent jobs

    parallel release of groups of independent jobs

    optimized system performance by controlling resource allocation

    centralized management of system workload

    Introduction to Batch Job Queues

    Seismic data processing using SeisSpace or ProMAX on an individualworkstation or a Linux cluster can benefit from using a flexible batchqueuing and resource management software package. Batch queueingsoftware generally has three components; a server, a scheduler, and somesort of executor (Mom). A generic diagram showing the relationshipbetween the various components of the Torque queuing software isillustrated below.

    TorqueServer

    TorqueScheduler

    TorqueMom

    ProMAX

    qmgr commands

    SeisSpace UI

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    30/59

    30 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Generic Queued Job Workflow

    1. A job is submitted to the queuing system server via a command like

    "qsub".

    2. The server communicates with the scheduler and requests the number

    of nodes the job needs.3. The scheduler gathers current node or workstation resource utilization

    and reports back to the server which nodes to use.

    4. The server communicates with the mom(s) to start the job on the

    node(s) allocated.

    Note that a single Linux workstation has one mom daemon, as the diagramabove shows, but the diagram for a Linux cluster can have hundreds tothousands of compute nodes with one mom on each.

    Torque and SGE (Sun Grid Engine) are typical of the available queuingpackages. For this release we tested and documented batch job queuingusing Torque. This package can be freely downloaded fromhttp://www.clusterresources.com/downloads/torque.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    31/59

    31 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Torque Installation and Configuration Steps

    1. Download and install Torque source code

    2. Set torque configuration parameters

    3. Compile and link the Torque source code

    4. Install the Torque executables and libraries

    5. Configure the Torque server and mom

    6. Test Torque Queue Submission

    7. Start Torque server, scheduler, and mom at boot

    8. Build the Torque packages for use in installing Torque on cluster

    compute nodes, then install these packages

    9. Integrate ProMAX and SeisSpace with Torque

    10. Recommendations for Torque queues

    Download and Install Torque Source Code

    Landmark does not distribute Torque so you will have to download thelatest source tar bundle. which looks similar to torque-xx.yy.zz, from thefollowing URL:

    http://www.clusterresources.com/downloads/torque

    The latest version of Torque we tested is 2.3.3. on a RedHat 4 Update 5system.

    Note: PBS and Torque are used interchangeably throughout this document.

    As the root user, untar the source code for building the Torque server,scheduler, and mom applications.

    > mkdir /apps/torque

    > cd /apps/torque

    > tar -zxvf /torque-xx.yy.zz.tar.gz

    > cd torque-xx.yy.zz

    If you decide you want to build the Torque graphical queue monitoringutilities (recommended) xpbs and xpbsmon, there are some requirements.

    Make sure tcl, tclx, tk, and their devel rpms are installed for thearchitecture type of your system, such as i386 or x86_64. Since the tcl-

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    32/59

    32 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    devel-8.*.rpm and tk-devel-8.*.rpm files may not be included with severalof the RHEL distributions, you may need to download them. There may beother versions that work as well. Any missing RPMs will need to beinstalled.

    Here is an example of required RPMs from a RHEL 4.5 x86_64

    installation:

    [root@sch1 prouser]# rpm -qa | grep tcl-8

    > tcl-8.4.7-2

    [root@sch1 prouser]# rpm -qa | grep tcl-devel-8

    > tcl-devel-8.4.7-2

    [root@sch1 prouser]# rpm -qa | grep tclx-8

    > tclx-8.3.5-4

    [root@sch1 prouser]# rpm -qa | grep tk-8

    > tk-8.4.7-2

    [root@sch1 prouser]# rpm -qa | grep tk-devel-8

    > tk-devel-8.4.7-2

    Here is an example of required RPMs from a RHEL 5.2 x86_64installation:

    > rpm -qa | grep libXau-dev

    libXau-devel-1.0.1-3.1

    > rpm -qa | grep tcl-devel-8

    tcl-devel-8.4.13-3.fc6

    > rpm -qa | grep xorg-x11-proto

    xorg-x11-proto-devel-7.1-9.fc6

    > rpm -qa | grep libX11-devel

    libX11-devel-1.0.3-8.el5

    > rpm -qa | grep tk-devel

    tk-devel-8.4.13-3.fc6

    > rpm -qa | grep libXdmcp-devel

    libXdmcp-devel-1.0.1-2.1

    > rpm -qa | grep mesa-libGL-devel

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    33/59

    33 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    mesa-libGL-devel-6.5.1-7.2.el5

    > rpm -qa | grep tclx-devel

    tclx-devel-8.4.0-5.fc6

    Set Torque Configuration Parameters

    We will now compile and link the server, scheduler, and mom all at thesame time, then later generate specific Torque "packages" to install on allcompute nodes, which run just the moms. There are many ways to installand configure Torque queues and here we are presenting just one.

    Torque queue setup for a single workstation is exactly the same as for themaster node of a cluster, except with some changes discussed later. Youshould be logged into the master node as root if you are installing on a

    Linux cluster, or logged into your workstation as root.

    Here is RHEL 4.5 x86_64:

    > ./configure --enable-mom --enable-server --with-scp --with-server-default= --enable-gui --enable-docs --with-tclx=/usr/lib64

    Here is RHEL 5.2 x86_64:

    > ./configure --enable-mom --enable-server --with-scp --with-server-

    default= --enable-gui --enable-docs --with-tcl=/usr/lib64 --without-tclx

    Note that we pointed to /usr/lib64 for the 64-bit tclx libraries. This wouldbe /usr/lib on 32-bit systems.

    With the use of "--with-scp" we are selecting ssh for file transfers betweenthe server and moms. This means that ssh needs to be set up such that nopasswords are required in both directions between the server and moms forall users.

    Compile and Link the Torque Source Code

    We will now compile, link and install the torque binaries.

    > make

    Install the Torque Executables and Libraries

    We will now install the Torque executables and libraries.> make install

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    34/59

    34 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Configure the Torque Server and Mom

    Instructions for installing and configuring Torque in this document treat asingle workstation and the master node of a cluster the same, then discusseswhere the configuration of a cluster is different.

    Lets go ahead and setup some two example queues for our workstation orcluster. The first thing we will do is configure our master node or singleworkstation for the Torque server and mom daemons.

    > cd /var/spool/torque/server_priv

    Now lets define which nodes our queues will be communicating with. Thefirst thing to do is to build the /var/spool/torque/server_priv/nodes file. Thisfile states the nodes that are to be monitored and submitted jobs to, the typeof node, the number of CPUs the node has, and any special node properties.

    Here is an example nodes file:

    master np=2 ntype=cluster promax

    n1 np=2 ntype=cluster promax seisspace

    n2 np=2 ntype=cluster promax seisspace

    n3 np=2 ntype=cluster seisspace

    .

    .

    nxx np=2 ntype=cluster seisspace

    The promax and seisspace entries are called properties. It is possible toassign queue properties that only submit jobs to nodes with that sameproperty. Instead of the entries n1, n2, etc., you would enter yourworkstations hostname or the hostnames of your compute nodes.

    Now lets initialize the pbs mom /var/spool/torque/mom_priv/config file,

    here is an example of what one would look like:

    # Log all but debug events, but 127 is good for normal logging.

    $logevent 127

    # Set log size and deletion parameters so we dont fill /var

    $log_file_max_size 1000

    $log_file_roll_depth 5

    # Make node unschedulable if load >4.0; continue when load drops

  • 8/11/2019 Ss Sys Admin

    35/59

    35 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    $max_load 4.0

    # Define server node

    $pbsserver

    # Use cp rather than scp or rcp for local (nfs) file delivery

    $usecp *:/export /export

    The $max_load and $ideal_load parameters will have to be tuned for yoursystem over time, and are gauged against the current entry in the/proc/loadavg file. You can also use the "uptime" command to see what thecurrent load average of the system is.

    How many and what type of processes can the node handle before it isoverloaded? For example, if you have a quad-core machine then a$max_load of 4 and an $ideal_load of 3.0 would be just fine. For the$pbsserver be sure to put the hostname of your Torque server.

    After a job is finished the stdout and stderr files are copied back to theserver so they can be viewed. The $usecp entry directs for which filessystems a simple "cp" command can be used rather than "scp" or "rcp".The output of the "df" command shows what should go into the $usecpentry. For example:

    df

    Filesystem 1K-blocks Used Available Use% Mounted on

    sch1:/data 480721640 327473640 148364136 69% /data

    The $usecp entry would be "$usecp *:/data /data"

    Now lets start the Torque server so we can load its database with our newqueue configuration.

    > /usr/local/sbin/pbs_server -t create

    Warning- if you have an existing set of Torque queues, the "-t create"

    option will erase those configured.

    Now we need to add and configure some queues. We have documented asimple script which should help automate this process. You can type theseinstructions in by hand, or build a script to run. Here is what this scriptlooks like:

    #!/bin/ksh

    /usr/local/bin/qmgr -e

  • 8/11/2019 Ss Sys Admin

    36/59

  • 8/11/2019 Ss Sys Admin

    37/59

    37 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    > /usr/local/bin/xpbsmon &

    You should see a GUI similar to the following, if you built it.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    38/59

    38 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Testing Torque Queue Submission

    Before integrating ProMAX with Torque it is a good idea to test the Torquesetup by submitting a job (script) to Torque from the command line. Here isan example script called pbs_queue_test:

    #!/bin/ksh#PBS -S /bin/ksh#PBS -N pbs_queue_test#PBS -j oe#PBS -r y#PBS -o /pbs_queue_output#PBS -l nodes=1######### End of Job ##########

    hostnameecho ""envecho ""cat $PBS_NODEFILE

    You will need to modify the #PBS -o line of the script to direct the outputto an NFS mounted filesystem which can be seen by the master node orsingle workstation. Submit the job to Torque as follows using anon-rootuser:

    > /usr/local/bin/qsub -q serial -m n /pbs_queue_test

    If the job ran successfully, there should be a file called /pbs_queue_output containing the results of the script.

    Starting Torque Server, Scheduler, and Mom to start at boot

    To start Torque daemons when the machines boot up, use the followingscripts for the master node and single workstation:

    pbs_server, pbs_sched, and pbs_mom

    The following/etc/init.d/pbs_serverscript startspbs_serverfor Linux:

    #!/bin/sh## pbs_server This script will start and stop the PBS Server## chkconfig: 345 85 85# description: PBS is a batch versitle batch system for SMPs andclusters

    ## Source the library functions

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    39/59

    39 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    . /etc/rc.d/init.d/functionsBASE_PBS_PREFIX=/usr/localARCH=$(uname -m)AARCH="/$ARCH"if [ -d "$BASE_PBS_PREFIX$AARCH" ]then

    PBS_PREFIX=$BASE_PBS_PREFIX$AARCHelse PBS_PREFIX=$BASE_PBS_PREFIXfiPBS_HOME=/var/spool/torque# let see how we were calledcase "$1" in start) echo -n "Starting PBS Server: " if [ -r $PBS_HOME/server_priv/serverdb ] then daemon $PBS_PREFIX/sbin/pbs_server else daemon $PBS_PREFIX/sbin/pbs_server -t create fi echo ;; stop) echo -n "Shutting down PBS Server: " killproc pbs_server echo ;; status)

    status pbs_server ;; restart) $0 stop $0 start ;; *) echo "Usage: pbs_server {start|stop|restart|status}" exit 1esac

    The following/etc/init.d/pbs_schedscript startspbs_schedfor Linux:

    #!/bin/sh## pbs_sched This script will start and stop the PBS Scheduler## chkconfig: 345 85 85# description: PBS is a batch versitle batch system for SMPs andclusters

    ## Source the library functions. /etc/rc.d/init.d/functions

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    40/59

    40 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    BASE_PBS_PREFIX=/usr/localARCH=$(uname -m)AARCH="/$ARCH"if [ -d "$BASE_PBS_PREFIX$AARCH" ]then PBS_PREFIX=$BASE_PBS_PREFIX$AARCH

    else PBS_PREFIX=$BASE_PBS_PREFIXfi# let see how we were calledcase "$1" in start) echo -n "Starting PBS Scheduler: " daemon $PBS_PREFIX/sbin/pbs_sched echo ;; stop) echo -n "Shutting down PBS Scheduler: " killproc pbs_sched echo ;; status) status pbs_sched ;; restart) $0 stop $0 start ;; *)

    echo "Usage: pbs_sched {start|stop|restart|status}" exit 1esac

    The following/etc/init.d/pbs_momscript startspbs_momfor Linux:

    #!/bin/sh## pbs_mom This script will start and stop the PBS Mom

    ## chkconfig: 345 85 85# description: PBS is a batch versitle batch system for SMPs andclusters## Source the library functions. /etc/rc.d/init.d/functionsBASE_PBS_PREFIX=/usr/localARCH=$(uname -m)AARCH="/$ARCH"if [ -d "$BASE_PBS_PREFIX$AARCH" ]

    then PBS_PREFIX=$BASE_PBS_PREFIX$AARCHelse

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    41/59

    41 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    PBS_PREFIX=$BASE_PBS_PREFIXfi# let see how we were calledcase "$1" in start) if [ -r /etc/security/access.conf.BOOT ]

    then cp -f /etc/security/access.conf.BOOT/etc/security/access.conf fi echo -n "Starting PBS Mom: " daemon $PBS_PREFIX/sbin/pbs_mom -r echo ;; stop) echo -n "Shutting down PBS Mom: " killproc pbs_mom echo ;; status) status pbs_mom ;; restart) $0 stop $0 start ;; *) echo "Usage: pbs_mom {start|stop|restart|status}" exit 1

    esac

    The following commands actually setup the scripts so the O/S will startthem at boot:

    > /sbin/chkconfig pbs_server on

    > /sbin/chkconfig pbs_sched on

    > /sbin/chkconfig pbs_mom on

    Installing Torque On The Compute Nodes

    Now that Torque seems to be working lets install it on the compute nodes.To perform this we need to generate some Torque self-extracting scriptscalled "packages". In these packages we need to also include Torque momsystem startup (init.d) scripts, as well mom configuration information. Notethat this step is not necesary for the single workstation.

    > cd /apps/torque-xx.yy.zz> mkdir pkgoverride;cd pkgoverride

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    42/59

  • 8/11/2019 Ss Sys Admin

    43/59

    43 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Connecting ProMAX and Torque

    ProMAX by default is set to use Torque (PBS) queues. The$PROMAX_HOME/etc/qconfig_pbs file defines which Torque queues areavailable for use, the name associations, the function to be called inbuilding a job execution script, and any variables which get passed to the

    function script. You should modify this file to conform with the Torquequeues that you have created.

    #

    # PBS batch queues

    #

    name = serial

    type = batch

    description = "Serial Execution Batch Jobs"function = pbs_submit

    menu = que_res_pbs.menu

    properties = local

    machine =

    #

    name = parallel

    type = batch

    description = "Parallel Execution Batch Jobs"function = pbs_submit

    properties = local

    menu = que_res_pbs.menu

    machine =

    The following is what the SeisSpace job submit window might resemblewith the configuration above:

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    44/59

    44 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    If you have configured your queues for a cluster, and have confirmed thatthey are working properly, you need to do a couple of things to disable themaster node from being used as a compute node.

    1. Turn off the pbs_mom.

    > /sbin/service pbs_mom stop

    2. Disable the pbs_mom from starting at boot.

    > /sbin/chkconfig pbs_mom off

    3. Remove the master node from the /var/spool/torque/server_priv/nodes

    file.

    Recommendations for Torque queues

    Based on our batch job queue testing efforts we offer the following guidelines for configuring your Torque batch job queues.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    45/59

    45 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    It is important that the queue does not release too many jobs at thesame time. You specify the number of available nodes and CPUs pernode in the/var/spool/torque/server_priv/nodes file. Each job is sub-mitted to the queue with a request for a number of CPU units. Thedefault for ProMAX jobs is 1 node and 1 CPU or 1 CPU unit. That is,to release a job, there must be at least one node that has 1 CPU un-allocated.

    There can be instances when jobs do not quickly release from thequeue although resources are available. It can take a few minutes forthe jobs to release. You can change thescheduler_iterationsettingthe Torque qmgrcommand. The default is 600 seconds (or 10 min-utes). We suggest a value of 30 seconds. Even with this setting, deadtime for up to 2 minutes have been observed. It can take some timebefore the loadavg begins to fall after the machine has been loaded.

    By default, Torque installs itself into the/var/spool/torque,/usr/local/binand/usr/local/sbindirectories. Always address theqmgrby its full name of/usr/local/bin/qmgr. The directory path/usr/local/binis added to the PATH statement inside the queue man-agement scripts by setting the PBS_BIN environment variable. If youare going to alter the PBS makefiles and have PBS installed in a loca-tion other than/usr/local, make sure you change the PBS_BIN envi-ronment setting in the ProMAXsys/exe/pbs/*files, and in theSeisSpaceetc/SSclient script example.

    Run thexpbsandxpbsmonprograms, located generally in the/usr/local/bin directory, to monitor how jobs are being released andhow the CPUs are monitored for availability. Black boxes in the xpb-smon user interface indicate that the node CPU load is greater thanwhat has been configured, and no jobs can be spawned there until theload average drops. It is normal for nodes to show as different coloredboxes in the xpbsmon display. This means that the nodes are busy andnot accepting any work. You can also modify the automatic updatetime in the xpbsmon display. However, testing has shown that theautomatic updating of the xpbs display may not be functioning.

    Landmark suggests that you read the documentation for Torque.These documents include more information about the system andways to customize the configuration, and can be found on the Torquewebsite.

    Torque requires that you have the hostnames and IP addresses in thehosts files of all the nodes.

    Note:hostname is the name of your machine;hostname.domainname

    can be found in/etc/hosts, and commonly ends with .com:

    ip address hostname.domain.com hostname

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    46/59

    46 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    For DHCP users, ensure that all of the processing and manager nodes

    always get the same ip address.

    We present one method of installing and configuring Torque job queues.There are many alternative methods that will be successful so long as thefollowing conditions exist:

    Install Torque for all nodes of the cluster. The installation can be doneon each machine independently, or you can use a common NFSmounted file system, or your cluster management software may con-tain a preconfigured image.

    Install all components including the server and scheduler on onenode. This is known as the server node and serves the other main pro-cessing nodes. Normally this will be the cluster manager node. On asingle workstation the server, scheduler, and mom daemons are all

    installed. The following files must be the same on all installations on all

    machines:

    /var/spool/torque/server_name

    /var/spool/torque/mom_priv/config

    These files are only used by the server and scheduler on the manager

    machine:

    /var/spool/torque/server_priv/nodes

    The UID and GID for users must be consistent across the master andcompute nodes.

    All application, data, and home directories must be mounted the sameon the master and compute nodes.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    47/59

    47 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Flat File-based Flow Replication

    This section discusses how flow replication is implemented in SeiSpace. Italso discusses the where and when the flat files are created and how theyrestored and managed.

    For more information about using the Flow Replication tools, please referto the chapter titledReplicating Flowsin the Using SeisSpaceguide.

    In Flat File-based Flow Replication, all flow replication data are stored inflat files in the $PROMAX_DATA_HOME/AREA/LINE and$PROMAX_DATA_HOME/AREA/LINE/FLOW directories

    LINE/ replicaParms.txt tab-delimited file with replica parameters editable in Excel; can be a symbolic link

    replicaParms.txt~ a backup that is generated whenever a repilcaParms.txt file is successfully loaded

    replicaPrefs.xml some constants stored about the replica table such as column width and display order

    Starting with the 5000.0.1.0 release in early 2009 a file locking mechanismwas added to manager the replicaParms.txt file. When a user opens thereplica table and adds columns or changes values, that user will write alock file. Other users will not be able to make edits to the replica table untilthe user who owns the lock file saves his/her work and releases the lock.

    LINE/FLOW (template flow) exec.#.pwflow --- exec.#.log | These are the files associated with the template exec.#.qsh | There may be multiple versions depending on | how many times the exec.#.qerr | template was run to test it packet.#.job ---

    jobs.stat --- This file is not used at this time but

    contains the status of the main flow

    exec.r#.#.pwflow --- These are the files associated with each | replica flow exec.r#.#.log | The first # after the r is the sequence number exec.r#.#.qsh --- The second # is the replica instance number. (more detail on replica instances later)

    replicas.#.stat --- a binary file that contains all of the job status information replicas of a particular version

    replicasInfo.xml --- a simple xml file that indicates that the flow is a template

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    48/59

    48 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    The general methodology follows the idea that it is possible that some ofthe replicas will need to be rerun or rebuilt and then rerun for a variety ofdifferent reasons. After you either rerun or rebuild/rerun some of thereplicas you will see multiple versions of the flow, printout and qsh files foreach instance of the replica and multiple replica.#.stat files. In thefollowing example replicas 1 and 2 have been built and run 4 times,replicas 3 and 4 have been run 3 times etc...

    $ ls exec.r1.* exec.r1.2.log exec.r1.2.pwflow exec.r1.2.qsh exec.r1.3.log exec.r1.3.pwflow exec.r1.3.qsh

    $ ls exec.r2.* exec.r2.2.log exec.r2.2.pwflow exec.r2.2.qsh exec.r2.3.log exec.r2.3.pwflow exec.r2.3.qsh

    $ ls exec.r3.* exec.r3.1.log exec.r3.1.pwflow exec.r3.1.qsh exec.r3.2.log exec.r3.2.pwflow exec.r3.2.qsh

    $ ls exec.r4.* exec.r4.1.log exec.r4.1.pwflow exec.r4.1.qsh exec.r4.2.log exec.r4.2.pwflow exec.r4.2.qsh

    $ ls exec.r5.* exec.r5.0.log exec.r5.0.pwflow exec.r5.0.qsh

    exec.r5.1.log exec.r5.1.pwflow exec.r5.1.qsh

    $ ls exec.r6.* exec.r6.0.log exec.r6.0.pwflow exec.r6.0.qsh exec.r6.1.log exec.r6.1.pwflow exec.r6.1.qsh

    $ ls exec.r7.* exec.r7.0.log exec.r7.0.pwflow exec.r7.0.qsh

    $ ls exec.r8.* exec.r8.0.log exec.r8.0.pwflow exec.r8.0.qsh

    $ ls exec.r9.* exec.r9.0.log exec.r9.0.pwflow exec.r9.0.qsh

    $ ls exec.r10.* exec.r10.0.log exec.r10.0.pwflow exec.r10.0.qsh

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    49/59

    49 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    $ ls -al *stat* jobs.stat replicas.0.stat replicas.1.stat replicas.2.stat

    replicas.3.stat

    Notice that the earlier numbered replicas such as 1 and 2 have instancenumbers 2 and 3 where as replica numbers 3 and 4 have instance numbers 1and 2. There is a preference setting that can be used to put a limit on thenumber of versions of replicas to keep. In this case the preference was setto keep and automatically purge to 2 versions of the replica flows. The twomost recent are retained.

    The job status information for all of these versions is stored in the different

    replicas.#.stat files. The status that is shown in the Replica Job Table (RJT)will be the status of the flow in the matching numbered stat file. Thereplica.3.stat file will only have information for those flows that had a 3rdinstance. The stat files contain the Job status such as Complete, Failed,User Terminated. The "Built" and "Unknown" status values are not stored.A flow is marked as Built if there is no known status for it in the matchingstat file and the flow files exist on disk. If multiple versions of replicatedflows exist, the status that will be shown is the status in the stat file of thehighest numbered replica.

    sequence number 1 1 1 1 1

    1 2 3 4 5 6 7 8 9 0 1 2 3 4

    ---------------------------

    . . . . x . . . x . . . . . 2.stat

    . . . x x . . . x x x . . . 1.stat

    x x x x x x x x x x x x x x 0.stat

    If you delete a replica using the delete function in the RJT, all existinginstances of the replicated flows will be deleted and the job status will be

    removed from all of the stat files. For example, if the replica flows forsequence 5 are deleted, the status will be removed from all existing statfiles. The status of the flow will be set to "Unknown" until the replica isrebuilt.

    sequence number

    1 1 1 1 1

    1 2 3 4 5 6 7 8 9 0 1 2 3 4

    ---------------------------

    . . . . . . . . x . . . . . 2.stat

    . . . x . . . . x x x . . . 1.stat

    x x x x . x x x x x x x x x 0.stat

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    50/59

    50 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    If you delete all of the replicas, all of the replica flow folders will bedeleted but the replica.#.stat files will not be deleted. All of the statusvalues in all of the stat files will be deleted.

    sequence number

    1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4

    ---------------------------

    . . . . . . . . . . . . . . 2.stat

    . . . . . . . . . . . . . . 1.stat

    . . . . . . . . . . . . . . 0.stat

    If you make a new set of replicas the instance numbering will start at 0again.

    If there are no replica flows left in the template flow, you can safely deleteall of the stat files.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    51/59

  • 8/11/2019 Ss Sys Admin

    52/59

    52 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Process Promotion and DemotionDev/Alpha/Beta/Prod

    SeisSpace supports the capability of having several versions of the samemodule or process in different stages of development. In this case, you maywant to switch between the versions in a flow without having to build andparameterize a new menu.

    This capability only extends to SeisSpace module development. There is noapplication for ProMAX tools.

    An example scenario may be that you have a program that you are workingon in your development environment and periodically you want to release aversion to production, but you want the development version to beavailable as well so that a tester can test a new development version against

    the current production version easily.

    In this case a user can insert a process into a flow by choosing the processfrom either the production processes list or the development processes listfrom the developers tree. Then the user can switch back and forth and havethe menus update with like parameters and execute the different versions ofthe program.

    The examples below will use several typical scenarios to illustrate usingthe process promotion/demotion capability/

    Note:These examples assume that your SeisSpace developmentenvironment has been configured using thePROWESS_HOME/port/bin/Makeseisspace script.

    First example - Simple single developer environment with two versions of amodule

    A simple example for an external development site might look something

    like this:

    There are two "systems" that the users need access to simultaneously.

    The customers standard Landmark-provided installation in a com-mon shared directory.

    The customers developers development system in the developershome directory.

    The standard Landmark system has no knowledge of the customers tool.

    The developer has two versions, a "production" version and a "dev"version.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    53/59

  • 8/11/2019 Ss Sys Admin

    54/59

    54 SeisSpace System Administration

    Known ProblemsOther DocsOther Docs

    Note:You want to make sure that getToolName is commented out.

    // public String getToolName() { // return "com.djg.prow-ess.tool.example0addamplitude.Example0AddAmplitudeTool"; // }

    In the production version of *Tool.java file you would have the first lines:

    // Each tool is in a java package with its proc (menu) file package com.djg.prowess.tool.example0addamplitude;

    and in the dev version of the *Tool.java file you would have the first line:

    // Each tool is in a java package with its proc (menu) file package com.djg.prowess.tool.example0addamplitude.dev;

    The production verion of the Makefile file would have the PACKAGE line:

    PACKAGE := com/djg/prowess/tool/example0addamplitude

    The dev verion of the Makefile file would have the PACKAGE line:

    PACKAGE := com/djg/prowess/tool/example0addamplitude/dev

    OPTION 1- make the two versions of the module available in thedevelopers Processes List:

    Edit the PROWESS.xml file in the developers home/prowess/etc diretory:

    [ssuser@nuthatch flowbuilder]$ pwd /home/ssuser/prowess/etc/flowbuilder

    [ssuser@nuthatch flowbuilder]$ more PROWESS.xml com.djg.prow-ess.tool.example0addamplitude.Example0AddAmplitudeProc com.djg.prow-ess.tool.example0addamplitude.dev.Example0AddAmplitudeProc

    Note: The "|dev" designation and the addition of "dev" to the PROC file

    path name for the development version of the tool.

    http://../rel_notes/known_problems.pdfhttp://../index.pdfhttp://../index.pdfhttp://../index.pdfhttp://../Index.pdfhttp://../rel_notes/known_problems.pdf
  • 8/11/2019 Ss Sys Admin

    55/59

  • 8/11/2019 Ss Sys Admin

    56/59

  • 8/11/2019 Ss Sys Admin

    57/59

  • 8/11/2019 Ss Sys Admin

    58/59

  • 8/11/2019 Ss Sys Admin

    59/59

    59 SeisSpace System Administration