More Details on Fluent

8/2/2019 More Details on Fluent

1/62

Chapter 32. Parallel Processing

The following sections describe the parallel-processing features of FLUENT.

Section 32.1: Introduction to Parallel Processing

Section 32.2: Starting the Parallel Version of the Solver

Section 32.3: Using the Fluent Launcher (Windows only)

Section 32.4: Using a Parallel Network of Workstations

Section 32.5: Partitioning the Grid

Section 32.6: Checking and Improving Parallel Performance

Section 32.7: Running Parallel FLUENT under SGE

Section 32.8: Running Parallel FLUENT under LSF

Section 32.9: Running Parallel FLUENT under Other Resource Management Tools

32.1 Introduction to Parallel Processing

The FLUENT serial solver manages file input and output, data storage, and flow fieldcalculations using a single solver process on a single computer. FLUENTs parallel solverallows you to compute a solution by using multiple processes that may be executing onthe same computer, or on different computers in a network. Figures 32.1.1 and 32.1.2illustrate the serial and parallel FLUENT architectures.

Parallel processing in FLUENT involves an interaction between FLUENT, a host process,and a set of compute-node processes. FLUENT interacts with the host process and thecollection of compute nodes using a utility called cortex that manages FLUENTs userinterface and basic graphical functions.

Parallel FLUENT splits up the grid and data into multiple partitions, then assigns eachgrid partition to a different compute process (or node). The number of partitions isan integral multiple of the number of compute nodes available to you (e.g., 8 partitionsfor 1, 2, 4, or 8 compute nodes). The compute-node processes can be executed on amassively-parallel computer, a multiple-CPU workstation, or a network of workstationsusing the same or different operating systems.

c Fluent Inc. January 11, 2005 32-1


2/62

Parallel Processing

Solver

CORTEX

Data:CellFaceNode

File Input/Output

Disk

Figure 32.1.1: Serial FLUENT Architecture

i In general, as the number of compute nodes increases, turnaround timefor the solution will decrease. However, parallel efficiency decreases as theratio of communication to computation increases, so you should be carefulto choose a large enough problem for the parallel machine.

FLUENT uses a host process that does not contain any grid data. Instead, the hostprocess only interprets commands from FLUENTs graphics-related interface, cortex.

The host distributes those commands to the other compute nodes via a socket commu-nicator to a single designated compute node called compute-node-0. This specializedcompute node distributes the host commands to the other compute nodes. Each compute

node simultaneously executes the same program on its own data set. Communicationfrom the compute nodes to the host is possible only through compute-node-0 and onlywhen all compute nodes have synchronized with each other.

Each compute node is virtuallyconnected to every other compute node, and relies on itscommunicator to perform such functions as sending and receiving arrays, synchroniz-ing, performing global operations (such as summations over all cells), and establishingmachine connectivity. A FLUENT communicator is a message-passing library. For ex-ample, the message-passing library could be a vendor implementation of the MessagePassing Interface (MPI) standard, as depicted in Figure 32.1.2.

All of the parallel FLUENT processes (as well as the serial process) are identified by

a unique integer ID. The host collects messages from compute-node-0 and performsoperations (such as printing, displaying messages, and writing to a file) on all of thedata, in the same way as the serial solver.

32-2 c Fluent Inc. January 11, 2005


3/62

32.1 Introduction to Parallel Processing

Compute Node 0

Socket

HOST

CORTEX

COMPUTE NODES

Compute Node 2 Compute Node 3

Compute Node 1

File Input/Output

Disk

Data:CellFaceNode FLUENT

MPIFLUENT

MPI

FLUENTMPI

FLUENTMPI

Data:CellFaceNode

Data:CellFaceNode

Data:CellFaceNode

MP

FLUENT

MPI

Figure 32.1.2: Parallel FLUENT Architecture



4/62

Parallel Processing

Recommended Usage of Parallel FLUENT

The recommended procedure for using parallel FLUENT is as follows:

1. Start up the parallel solver and spawn additional compute nodes (if necessary). SeeSections 32.2 and 32.4 for details.

2. Read your case file and have FLUENT partition the grid automatically upon loadingit. It is best to partition after the problem is set up, since partitioning has somemodel dependencies (e.g., adaption on non-conformal interfaces, sliding-mesh andshell-conduction encapsulation).

Note that there are other approaches for partitioning, including manual partitioningin either the serial or the parallel solver. See Section 32.5: Partitioning the Gridfor details.

3. Review the partitions and perform partitioning again, if necessary. See Section 32.5.5: Check

ing the Partitions for details on checking your partitions.4. Calculate a solution. See Section 32.6: Checking and Improving Parallel Performance

for information on checking and improving the parallel performance.

32.2 Starting the Parallel Version of the Solver

The way you start the parallel version of FLUENT depends on whether you are using adedicated parallel machine or a workstation cluster.

32.2.1 Starting the Parallel Solver on a UNIX System

You can run FLUENT on a UNIX dedicated parallel machine or a network of UNIXworkstations. The procedures for starting these versions are described in this section.

Running on a Multiprocessor UNIX Machine

To run FLUENT on a dedicated parallel machine (i.e., a multiprocessor workstation ora massively parallel machine), type the usual startup command without a version (i.e.,fluent), and then use the Select Solver panel (Figure 32.2.1) to specify the parallelarchitecture and version information.

File Run...

1. Under Versions, specify the 3D or 2D single- or double-precision version by turningthe 3D and Double Precision options on or off, and turn on the Parallel option.



5/62


Figure 32.2.1: The Select Solver Panel

2. Under Options, select the message-passing library in the Communicator drop-downlist. The Default library is recommended, because it selects the library that shouldprovide the best overall parallel performance for your dedicated parallel machine.

If you prefer to select a specific library, you can choose either Vendor MPI or SharedMemory MPI (MPICH). Vendor MPI selects the message-passing library optimizedby your hardware vendor. If the parallel toolkit supplied by your hardware vendoris installed on your machine, FLUENT will detect it automatically when the De-fault option is selected. Shared Memory MPI (MPICH) selects the MPICH message-passing library, a public-domain version of MPI.

3. Set the number of CPUs in the Processes field.

4. Click the Run button to start the parallel version. No additional setup is requiredonce the solver starts.

If you prefer to start the parallel version from the command line, you can type

fluent version -tn [-pcomm] [-loadhost] [-pathpath]

where version is 2d, 3d, 2ddp, or 3ddp, and n is replaced by the number of CPUs to beused. The remaining arguments are optional, as indicated by the square brackets around



6/62

Parallel Processing

them. (If you enter one or more of these optional arguments, do not include the squarebrackets.) comm is replaced by the name of the parallel communication library, host isreplaced by the hostname of the machine to launch the compute nodes (by default, it isset to the machine youre using when entering this command), and path is replaced bythe root path to the Fluent.Inc installation directory.

i In general, you will need to specify -pcommonly if you want to override thedefault communication library (which should provide best overall parallelperformance).

The available communicators for dedicated parallel UNIX machines are listed below (Ta-bles 32.2.1 and 32.2.2), along with their associated communication libraries, the corre-sponding syntax, and the supported architectures (See Step 2, above, for a descriptionof these libraries):

Table 32.2.1: Available communicators for UNIX platforms (per platform)

Platform Processor Architecture CommunicatorsLinux 32 bit lnx86 beo, net, nmpi, scampi, smpi

64 bit Itanium lnia64 net, nmpi, smpiUltra 32 bit ultra net, nmpi, smpi, vmpi

64 bit ultra64 net, smpi, vmpiSGI 32 bit irix65 mips4 net, nmpi, smpi, vmpi

64 bit irix65 mips4 64 net, nmpi, smpi, vmpiHP 32 bit hpux11 net, nmpi, smpi, vmpi

64 bit Parish hpux11 64 net, nmpi, smpi, vmpi

64 bit Itanium hpux11 ia64 net, vmpiDEC 64 bit alpha net, nmpi, smpi, vmpi, tmpiFujitsu 64 bit fujitsu pp net, nmpi, vmpiIBM 32 bit aix51 net, nmpi, smpi, vmpi

64 bit aix51 64 net, nmpi, smpi, vmpi



7/62


Table 32.2.2: Available communicators for UNIX platforms (per communi-cator)

Commu-nicator

Syntax(flag)

Commun.Library

Supportsspawn-ingnodes

Vendorimpl.avail-able(costs)

UsedwithDMM***

UsedwithSMM**

Plat-form

net -pnet socket yes no yes yes all

platformsnmpi -pnmpi network no no yes yes all

MPI(MPICH)

platforms

smpi -psmpi shared no no no yes allMPI(MPICH)

platforms

vmpi -pvmpi Vendor no yes yes yes allMPI platforms

exceptLinux

beo * -pbeo beowoulf no no yes yes Linuxscampi * -pscampi SCAMPI no no yes yes Linuxtmpi -ptmpi MPI no no yes yes DEC

* Not formally qualified in FLUENT but vendor might support.

* SMM is Shared Memory Machine where the memory is shared between the processors on a single

machine.

** DMM is Distributed Memory Machine where each processor has its own memory associated with it.

nmpi is recommended to be used with DMM if vmpi is not available, and smpi is recommended to be

used with SMM if vmpi is not available.



8/62

Parallel Processing

Running on a UNIX Workstation Cluster

To run FLUENT on a network of UNIX workstations, type the usual startup commandwithout a version (i.e., fluent), and then use the Select Solver panel (Figure 32.2.1) tospecify the parallel architecture and version information.

File

Run...


2. Under Options, select the Socket message-passing library in the Communicator drop-down list.

i When you start the parallel network version, you must select Socket or Net-work MPI (MPICH) in the Communicator drop-down list, unless the vendorMPI library (described earlier in this section) supports clustering. If you

keep the Default option, one of the MPI parallel versions will start instead,and you will be unable to spawn additional compute nodes.

3. Set the number of initial compute node processes to spawn on the host machine inthe Processes field. You can start with 1 or 0 nodes and spawn the rest later on, asdescribed in Section 32.4.1: Configuring the Network.

4. (optional) Specify the name of a file containing a list of machines, one per line, inthe Hosts File field. If the number of Processes is set to 0, FLUENT will spawn acompute node on each machine listed in the file.

5. Click the Run button to start the parallel network version.

If you prefer to start the parallel network version from the command line, you can type

fluent version -t1 -pnet

(to use the socket communicator) or

fluent version -t1 -pnmpi

(to use the network MPI communicator) to start the solver with 1 compute node on thehost workstation. You can then spawn additional processes on remote workstations usingthe Network Configuration panel, as described in Section 32.4.1: Configuring the Network.

You can type

fluent version -t0 -pnet [-cnf=hostsfile]


fluent version -t0 -pnmpi [-cnf=hostsfile]



9/62


(to use the network MPI communicator) to start a host process that controls computenodes situated on remote machines. If the optional -cnf=hostsfile is specified, a computenode will be spawned on each machine listed in the file hostsfile. (If you enter this optionalargument, do not include the square brackets.) Otherwise, you can spawn the processesas described in Section 32.4.1: Configuring the Network.

32.2.2 Starting the Parallel Solver on a LINUX System

You can run FLUENT on a LINUX dedicated parallel machine or a network of LINUXworkstations. The procedures for starting these versions are described in this section.

Running on a Multiprocessor LINUX Machine

To run FLUENT on a dedicated parallel machine (i.e., a multiprocessor workstation ora massively parallel machine), type the usual startup command without a version (i.e.,fluent), and then use the Select Solver panel (Figure 32.2.1) to specify the parallel

architecture and version information.File Run...


2. Under Options, select the message-passing library in the Communicator drop-downlist. The Default library is recommended, because it selects the library that shouldprovide the best overall parallel performance for your dedicated parallel machine.

If you prefer to select a specific library, you can choose either Vendor MPI or Shared

Memory MPI (MPICH). Vendor MPI selects the message-passing library optimizedby your hardware vendor. If the parallel toolkit supplied by your hardware vendoris installed on your machine, FLUENT will detect it automatically when the De-fault option is selected. Shared Memory MPI (MPICH) selects the MPICH message-passing library, a public-domain version of MPI.

3. Set the number of CPUs in the Processes field.

4. Click the Run button to start the parallel version. No additional setup is requiredonce the solver starts.

If you prefer to start the parallel version from the command line, you can type

fluent version -tn [-pcomm] [-loadhost] [-pathpath]

where version is 2d, 3d, 2ddp, or 3ddp, and n is replaced by the number of CPUs to beused. The remaining arguments are optional, as indicated by the square brackets aroundthem. (If you enter one or more of these optional arguments, do not include the square



10/62

Parallel Processing

brackets.) comm is replaced by the name of the parallel communication library, host isreplaced by the hostname of the machine to launch the compute nodes (by default, it isset to the machine youre using when entering this command), and path is replaced bythe root path to the Fluent.Inc installation directory.

iIn general, you will need to specify -pcommonly if you want to override thedefault communication library (which should provide best overall parallelperformance).

The available communicators for dedicated parallel lnx86 LINUX machines are listed inTables 32.2.1 and 32.2.2, along with the associated communication libraries and corre-sponding syntax.

FLUENT supplies the necessary components for the ssh, nmpi, smpi, and net communi-cators. As for the rest, you need to contact the vendor directly.

See step 2, above, for a description of these libraries.

Running on a LINUX Workstation Cluster

To run FLUENT on a network of LINUX workstations, type the usual startup commandwithout a version (i.e., fluent), and then use the Select Solver panel (Figure 32.2.1) tospecify the parallel architecture and version information.

File Run...


2. Under Options, select the Socket message-passing library in the Communicator drop-down list.

i When you start the parallel network version, you must select Socket or Net-work MPI (MPICH) in the Communicator drop-down list, unless the vendorMPI library (described earlier in this section) supports clustering. If youkeep the Default option, one of the MPI parallel versions will start instead,and you will be unable to spawn additional compute nodes.

3. Set the number of initial compute node processes to spawn on the host machine inthe Processes field. You can start with 1 or 0 nodes and spawn the rest later on, as

described in Section 32.4.1: Configuring the Network.

4. (optional) Specify the name of a file containing a list of machines, one per line, inthe Hosts File field. If the number of Processes is set to 0, FLUENT will spawn acompute node on each machine listed in the file.

5. Click the Run button to start the parallel network version.



11/62


If you prefer to start the parallel network version from the command line, you can type

fluent version -t1 -pnet


fluent version -t1 -pnmpi

(to use the network MPI communicator) to start the solver with 1 compute node on thehost workstation. You can then spawn additional processes on remote workstations usingthe Network Configuration panel, as described in Section 32.4.1: Configuring the Network.

You can type

fluent version -t0 -pnet [-cnf=hostsfile]


fluent version -t0 -pnmpi [-cnf=hostsfile]

(to use the network MPI communicator) to start a host process that controls computenodes situated on remote machines. If the optional -cnf=hostsfile is specified, a computenode will be spawned on each machine listed in the file hostsfile. (If you enter this optionalargument, do not include the square brackets.) Otherwise, you can spawn the processesas described in Section 32.4.1: Configuring the Network.

Running With Multiple Network Cards

For Linux machines (lnx86, lnia64, and lnamd64) that have multiple network cardsusing either the net or the mpi communicators, you can choose a specific network cardfor your calculations. When nodes on a cluster have multiple network cards (fast ethernet

and gigabyte, for example), FLUENT allows you to choose a particular network card forthe computation by specifying the appropriate name or IP address in the host file.

32.2.3 Starting the Parallel Solver on a Windows System

You can run FLUENT on a Windows dedicated parallel machine or a network of Windowsmachines. The procedures for starting these versions are described in this section.

Running on a Multiprocessor Windows Machine

On a Windows system, you can start the dedicated parallel version of FLUENT from the

MS-DOS Command Prompt window. To start the parallel version on x processors, type

fluent version -tx

at the prompt, replacing version with the solver version (2d, 3d, 2ddp, or 3ddp) and xwith the number of processors (e.g., fluent 3d -t3 to run the 3D version on 3 proces-



12/62

Parallel Processing

sors). (See Section 1.1.3: Starting FLUENT on a Windows System for information aboutmodifying your user environment if the fluent command is not recognized.)

Running on a Windows Cluster

There are several ways to run FLUENT in parallel on a network of Windows machines:using one of the communicators that is included with the FLUENT distribution, or usingeither a vendor-supplied or a public domain message-passing interface.

The available communicators for dedicated parallel ntx86 Windows machines, the asso-ciated communication libraries for them, and the corresponding syntax are listed below:

Table 32.2.3: Available communicators for Windows platform (per platform)

Platform Processor Architecture CommunicatorsWindows 32 bit ntx86 net, nmpi, smpi, vmpi

Table 32.2.4: Available communicators for Windows platform (per commu-nicator)

Commu-nicator

Syntax Commun.Library

Supportsspawningnodes

Vendorimpl. avail-able (costs)

UsedwithDMM**

UsedwithSMM*

net -pnet socket yes no yes yesnmpi -pnmpi network

MPI(MPICH)

no no yes yes

smpi -psmpi sharedMPI(MPICH)

no no no yes

vmpi -pvmpi VendorMPI

no yes yes yes

* SMM is Shared Memory Machine where the memory is shared between the processors on a single

machine.

* DMM is Distributed Memory Machine where each processor has its own memory associated with it.nmpi is recommended to be used with DMM if vmpi is not available, and smpi is recommended to be

used with SMM if vmpi is not available.

See the installation instructions for Windows parallel for details about obtaining andinstalling one of these programs. The startup instructions below assume that you haveproperly set up the necessary software, based on the appropriate installation instructions.

http://-/?-http://-/?-


13/62


Starting the Socket-Based Parallel Version of FLUENT

If you are using the socket version for network communication, type the following in anMS-DOS Command Prompt window:

fluent version -tnprocs -pnet [-cnf=hostfile] -pathsharename

where

versionmust be replaced by the version of FLUENT you want to run (2d, 3d, 2ddp,or 3ddp).

-pathsharename specifies the shared network name for the Fluent.Inc directoryin UNC form.

For example, ifFLUENT has been installed on computer1, then you should replacesharename by the UNC name for the shared directory, \\computer1\fluent.inc.

-cnf=hostfile (optional) specifies the hostfile, which contains a list of the computers

on which you want to run the parallel job. If the hostfile is not located in thedirectory where you are typing the startup command, you will need to supply thefull pathname to the file. (If you include the -cnf option, do not include the squarebrackets; see the example below.)

You can use a plain text editor like Notepad to create the hostfile. The onlyrestriction on the filename is that there should be no spaces in it. For example,hosts.txt is an acceptable hostfile name, but my hosts.txt is not.

Your hostfile (e.g., hosts.txt) might contain the following entries:

computer1

computer2

i The first computer in the list must be the name of the local computer youare working on. The last entry must be followed by a blank line.

If a computer in the network is a multiprocessor, you can list it more than once.For example, ifcomputer1 has 2 CPUs, then, to take advantage of both CPUs, thehosts.txt file should list computer1 twice:

computer1

computer1

computer2

If you do not include the -cnf option, FLUENT will start nprocs (see below) pro-cesses on the computer where you type the startup command. You can then use theNetwork Configuration panel in FLUENT to interactively spawn additional nodes onthe cluster. See Section 32.4: Using a Parallel Network of Workstations for details.



14/62

Parallel Processing

-tnprocs specifies the number of processes to use. If the -cnf option is present,the hostfile argument is used to determine which computers to use for the parallel

job. For example, if there are 10 computers listed in the hostfile and you want torun a job with 5 processes, set nprocs to 5 (i.e., -t5) and FLUENT will use the first5 machines listed in the hostfile.

You can use the Network Configuration panel to kill processes or spawn additionalprocesses after startup. See Section 32.4: Using a Parallel Network of Workstationsfor details.

As an example, the full command line to start a 3D socket-based parallel job on the first3 computers listed in a hostfile called hosts.txt is as follows:

fluent 3d -t3 -pnet -cnf=hosts.txt -path\\computer1\fluent.inc

Starting the MPI-Based Parallel Version of FLUENT

If you are using either vendor-supplied or public domain MPI software for network com-munication, type the following in an MS-DOS Command Prompt window:

fluent version -tnprocs -pcomm -cnf=hostfile -pathsharename

where comm can be either nmpi or vmpi and the remaining options have the same mean-ings as for the socket-based startup described above, with the following differences:

The hostfile specification is required. You can neither spawn nor kill nodes on thecluster using the Network Configuration panel when MPI software is used.

The first computer listed in the hostfile must be the name of the local computeryou are working on.

As an example, the full command line to start a 3D vendor-MPI-based parallel job onthe first 3 computers listed in a hostfile called hosts.txt is as follows:

fluent 3d -t3 -pvmpi -cnf=hosts.txt -path\\computer1\fluent.inc



15/62

32.3 Using the Fluent Launcher (Windows only)


The Fluent Launcher (Figure 32.3.1), is a stand-alone Windows application that allowsyou to launch FLUENT jobs from a computer with a Windows operating system to acluster of computers. The Fluent Launcher takes the options that you specify in the mainFluent Launcher panel and the Fluent Setup panel (see Section 32.3.1: Fluent Launcher

Path Setup and Section 32.3.2: Fluent Launcher Machine Setup), and uses those settingsto create a FLUENT parallel command. This command will then be distributed to yournetwork where typically another application may manage the session(s).

You can create a shortcut on your desktop pointing to the Fluent Launcher executable at

FLUENT_INC\fluent6.x\launcher\bin\launcher.exe

where FLUENT INC is the root path to where FLUENT is installed, (i.e., usually theFLUENT INC environment variable) and x indicates the release version ofFLUENT).

Figure 32.3.1: The Fluent Launcher Panel

The Fluent Launcher allows you to perform the following:

1. Set options for your FLUENT executable, such as specifying an area, indicating arelease type, or a version number.

2. Indicate either a serial or parallel execution, along with the number of parallelprocesses, and a communicator to use for parallel computations.



16/62

Parallel Processing

3. Set additional options such as specifying a working directory, a batch mode, or ajournal file.

When you are ready to launch your serial or parallel application, click the Launch button.

i For parallel applications, you are required to have the RSH daemon in-stalled on each machine.

Using the Fluent Launcher From Another Machine

If you wish to use the Fluent Launcher from another machine, you can create a shortcut onthat machine pointing to the original executable (at FLUENT INC/fluent6.x/launcher/bin/launcher.exe where FLUENT INC is the root path to where FLUENT is installed,(i.e., usually the FLUENT INC environment variable) and x indicates the release version ofFLUENT).

i Do not copy or move the launcher.exe file from its original directoryto any other directory, otherwise the Fluent Launcher application will notwork.

Setting Executable Options With the Fluent Launcher

Under Executable Options, you can use the Fluent Launcher to indicate the version of theFLUENT executable that you want to run. You can also specify a release number, andthe area from which you are running the code.

Under Area, you can choose from either release or prototype. The release option represents

the final version of the current software (either a FLUENT release or a FLUENT main-tenance release). The prototype option represents a FLUENT prototype or pre-release(beta) version of the software.

Under Release, you can specify the number associated with a given release, maintenancerelease or prototype application.

Under Version, you can specify the dimensionality and the precision of the FLUENT prod-uct. There are four possible choices: 2d, 2ddp, 3d, or 3ddp. The 2d and 3d options providesingle precision results for two-dimensional or three-dimensional problems, respectively.The 2ddp and 3ddp options provide double precision results for two-dimensional or three-dimensional problems, respectively.



17/62


Setting Parallel Options With the Fluent Launcher

Under Parallel Options, you can use the Fluent Launcher to indicate whether you want torun FLUENT in serial mode or in parallel mode.

To run FLUENT in serial mode, make sure the Parallel option is turned off.

To run FLUENT in parallel, make sure the Parallel option is turned on. When the Paralleloption is turned on, you can indicate the number of parallel processes that you will berunning, as well as the type of parallel communicator that you need to use.

Use the Processes field to indicate the number of parallel processes. The range of parallelprocesses ranges from 1 to 1024. If Processes is equal to 1, you might want to considerrunning the FLUENT job in serial mode.

Use the Communicator field to indicate the type of parallel communicator that you require.There are several options, based on the operating system of the parallel cluster. SeeTables 32.2.1, 32.2.2, and 32.2.3 for more information.

Setting Additional Options With the Fluent Launcher

Under Additional Options, you can use the Fluent Launcher to indicate a working directory,whether you want to run FLUENT using batch mode, list executed commands, or whetheror not use a journal file.

In the Directory field, enter the path of your current working directory or click Browse...to browse through your directory structure.

Select the Journal File option to instruct the Fluent Launcher application to use a journalfile. Once selected, provide the path to the journal file and the name of the journal

file. Using the journal file, you can automatically load the case, compile any user-definedfunctions, iterate until the solution converges, and write results to a output file.

Select the Batch Mode option in order to run and quit out of FLUENT jobs without thegraphical user interface (GUI).



18/62

Parallel Processing

32.3.1 Fluent Launcher Path Setup

The Fluent Launcher can be used to set up path information for your FLUENT jobsthrough the Paths tab in the Fluent Setup panel (Figure 32.3.2). To access the FluentSetup panel, click the Setup... button in the Fluent Launcher.

Figure 32.3.2: The Paths Tab in the Fluent Setup Panel

The Paths tab in the Fluent Setup panel allows you to use a set of custom path config-urations. New setup information is saved for this session and future sessions when youclick the Apply button.

When you are finished setting up your custom path configuration, click the Close buttonto dismiss the Fluent Setup panel.

Windows Setup

When you choose to use your own path configuration information, you can then indicatethe Release path on the Windows platform. This field holds the path to the release areafor Windows executables.

i Make sure that the path is a UNC path (i.e., accessible to all nodes).

If you have turned on the Enable Prototype option, then you have the additional optionof selecting the path to the prototype area for Windows executables using the Prototypefield under Windows Paths



19/62


UNIX Setup

When you choose to use your own path configuration information, you can then indicatethe Release path on the UNIX platform. This field holds the path to the release area forUNIX executables.

If you have turned on the Enable Prototype option, then you have the additional option ofselecting the path to the prototype area for UNIX executables using the Prototype fieldunder UNIX Paths.

i Note that UNIX paths are not verified.

32.3.2 Fluent Launcher Machine Setup

The Fluent Launcher can be used to set up different machine configurations for yourFLUENT jobs through the Machines tab in the Fluent Setup panel (Figure 32.3.3). Toaccess the Fluent Setup panel, click the Setup... button in the Fluent Launcher.

The Machines tab in the Fluent Setup panel allows you to use a different machine con-figuration. New setup configuration is saved for the current session and future sessionswhen you click the Apply button. Machines listed at the top of the list will be used first.

Figure 32.3.3: The Machines Tab in the Fluent Setup Panel

Using the Machines tab in the Fluent Setup Panel, you can create and edit a listing ofmachine names that you want involved in the parallel FLUENT job.

You can add a machine name to the Current Machines list by entering a name in theMachine Name field and clicking the Add button.



20/62

Parallel Processing

You can remove a machine name from the Current Machines list by selecting the name inthe list and clicking the Remove button.

You can manipulate how the names are listed in the Current Machines list by selecting aname in the list and using the Up button to move the name one listing closer to the topof the list. Likewise, you can move a name one listing closer to the bottom of the list by

selecting the name and clicking the Down button.

When you are finished setting up your machine configuration, click the Close button todismiss the Fluent Setup panel.

32.3.3 Fluent Launcher Example

The Fluent Launcher takes the options that you have specified in the main Fluent Launcherpanel and the Fluent Setup panel, and uses those settings to create a FLUENT parallelcommand. This command will then be distributed to your network where typicallyanother application may manage the session(s).

For example, if, in the main Fluent Launcher panel, under Executable Options, you selectedrelease for the Area, 6.1.28 for the Release, and 3d for the Version. Then, under ParallelOptions, you selected Parallel, chose 2 for the number of Processes, and selected net forthe Communicator. Then, in the Fluent Setup panel (clicking the Setup... button), youspecified \\Server\Fluent.Inc, for Release under Windows Path, and added my pc tothe list ofCurrent Machines in the Machines tab. Finally, you clicked the Apply button inthe Fluent Setup panel and then clicked the Launch button. The Fluent Launcher wouldthen generate the following parallel command:

FLUENT_INC\ntbin\ntx86\fluent -r6.1.28 3d -t2 -pnet -path\FLUENT\_HOME

-cnf="machines_file"

where FLUENT INC indicates the directory where Fluent.Inc is located and machines fileindicates the location of the machine configuration file that the Fluent Launcher gener-ates. This file contains the names of the machines (e.g., my pc) indicated in the Machinestab in the Fluent Setup panel.



21/62

32.4 Using a Parallel Network of Workstations


You can create a virtual parallel machine by spawning (and killing) compute node pro-cesses on workstations connected by a network. Multiple compute node processes areallowed to exist on the same workstation, even if the workstation contains only a singleCPU.

32.4.1 Configuring the Network

If you want to spawn compute nodes on several different machines, or if you want tomake any changes to the current network configuration (e.g., if you accidentally spawnedtoo many compute nodes on the host machine when you started the solver), you can usethe Network Configuration panel (Figure 32.4.1).

Parallel Network Configure...

Figure 32.4.1: The Network Configuration Panel

iNote that not all communicators allow you to configure a network ofspawned compute nodes if you do not start FLUENT using host files. Only-pnet allows you to manually spawn additional compute nodes before read-ing the case file. Using -pnmpi, for example, does not allow you to configurethe network of spawned compute nodes.



22/62

Parallel Processing

Structure of the Network

Compute nodes are labeled sequentially starting at 0. In addition to the compute nodeprocesses, there is one host process. The host process is automatically started whenFLUENT starts, and it is killed when FLUENT exits. It cannot be killed while running.Compute nodes, however, can be killed at any time, with the exception that compute

node 0 can only be killed if it is the last remaining compute node process. The hostprocess always spawns compute node 0. Compute node 0 spawns all other computenodes.

Steps for Spawning Compute Nodes

The basic steps for spawning compute nodes are as follows:

1. Choose the host machine(s) on which to spawn compute nodes in the AvailableHosts list. If the desired machine is not listed, you can use the Host Entry fields to

manually add a host (as described below), or you can copy the desired host fromthe host database (as described in Section 32.4.2: The Hosts Database).

2. Set the number of compute node processes to spawn on each selected host machinein the Spawn Count field.

3. Click the Spawn button and the new node(s) will be spawned and added to theSpawned Compute Nodes list.

Additional functions related to network configuration are described below.

Adding Hosts Manually

To add a host to the Available Hosts list in the Network Configuration panel manually,you can enter the internet name of the remote machine in the Hostname field under HostEntry, enter your login name on that machine in the Username field (unless your accountsall have the same login name, in which case you need not specify a username), and thenclick the Add button. The specified host will be added to the Available Hosts list.



23/62


Deleting Hosts

To delete a host from the Available Hosts list in the Network Configuration panel, selectthe host and click the Delete button. The host name will be removed from the AvailableHosts list (but the hosts database (see Section 32.4.2: The Hosts Database) will not beaffected).

Killing Compute Nodes

If you spawn an undesired compute node, you can easily remove it by selecting it in theSpawned Compute Nodes list and clicking on the Kill button.

i Remember that compute node 0 can only be killed if it is the last remainingcompute node process.

Saving a Hosts File

If you have compiled a group ofAvailable Hosts that you may want to use again in anothersession, you can save a hosts file containing all entries in the Available Hosts list. Clickthe Save... button and, in the resulting Select File dialog box, enter the name of thefile and save it. In a future session, you can load the contents of this file into the hostsdatabase (see Section 32.4.2: The Hosts Database) and then copy the hosts over to theNetwork Configuration panel in order to reproduce the current Available Hosts list.

Common Problems Encountered During Node Spawning

The spawning process will try to establish a connection with a new compute node, butif after 50 seconds it receives no response from the new compute node, it will assumethe spawn was unsuccessful. The spawn will be unsuccessful, for example, if the remotemachine is unable to find the FLUENT executable. To manually test if the spawningmachine can start a new compute node, you can type

rsh [-l username] hostname fluent -t0 -v

from a shell prompt on the spawning machine. hostname should be replaced with theinternet name of the machine on which you want to spawn a compute node, and usernameshould be replaced with your login name on the remote machine specified by hostname.



24/62

Parallel Processing

i If all your accounts have the same login name, you do not need to specifya username. (The square brackets around -l username indicate that itis not always required; if you do enter a login name, do not include thesquare brackets.) Note that on some systems, the remote shell commandis remsh instead of rsh.

The spawn test could fail for several reasons:

Login incorrect. The machine spawning a new compute node must be able to rshto the machine where the new process will reside, or the spawn will fail. Thereare several ways to enable this capability. Consult your systems administrator forassistance.

fluent: Command not found. The rsh to the remote machine succeeded, but thepath to the FLUENT shell script could not be found on that machine. If you areusing csh, then the path to the FLUENT shell script should be added to the path

variable in your .cshrc file. If that also fails, you can use the parallel/network/path text command to set the path to the Fluent.Inc installation directory directlybefore spawning the compute node.

parallel network path



25/62


32.4.2 The Hosts Database

When you are creating a parallel network of workstations, it is convenient to start witha list of machines that are part of your local network (a hosts file). You can load afile containing these names into the hosts database and then select the hosts that areavailable for creating a parallel configuration (or network) on a cluster of workstations

using the Hosts Database panel (Figure 32.4.2).

Parallel Network Database...

Figure 32.4.2: The Hosts Database Panel

(You can also open this panel by clicking on the Database... button in the NetworkConfiguration panel.)

If the hosts file fluent.hosts or .fluent.hosts exists in your home directory, its con-tents are automatically added to the hosts database at startup. Otherwise, the hostsdatabase will be empty until you read in a host file.



26/62

Parallel Processing

Reading Hosts Files

If you have a hosts file containing a list of machines on your local network, you can loadthis file into the Hosts Database panel by clicking on the Load... button and specifyingthe file name in the resulting Select File dialog box. Once the contents of the file havebeen read, the host names will appear in the Hosts list. (FLUENT will automatically

add the IP (Internet Protocol) address for each recognized machine. If a machine is notcurrently on the local network, it will be labeled unknown.)

Copying Hosts to the Network Configuration Panel

If you want to copy one or more of the Hosts in the Hosts Database panel to the AvailableHosts list in the Network Configuration panel, select the desired name(s) in the Hosts listand click the Copy button. The selected hosts will be added to the list ofAvailable Hostson which you can spawn nodes.

32.4.3 Checking Network ConnectivityFor any compute node, you can print network connectivity information that includes thehostname, architecture, process ID, and ID of the selected compute node and all machinesconnected to it. The ID of the selected compute node is marked with an asterisk.

The ID for the FLUENT host process is always host. The compute nodes are numberedsequentially starting from node-0. All compute nodes are completely connected. Inaddition, compute node 0 is connected to the host process.

To obtain connectivity information for a compute node, you can use the Parallel Connec-tivity panel (Figure 32.4.3).

Parallel Show Connectivity...

Figure 32.4.3: The Parallel Connectivity Panel

Indicate the compute node ID for which connectivity information is desired in the Com-pute Node field, and then click the Print button. Sample output for compute node 0 isshown below:



27/62

32.5 Partitioning the Grid

------------------------------------------------------------------------------

ID Comm. Hostname O.S. PID Mach ID HW ID Name

------------------------------------------------------------------------------

host net balin Linux-32 17272 0 7 Fluent Host

n3 smpi balin Linux-32 17307 1 10 Fluent Node

n2 smpi filio Linux-32 17306 0 -1 Fluent Node

n1 smpi bofur Linux-32 17305 0 1 Fluent Noden0* smpi balin Linux-32 17273 2 11 Fluent Node

O.S is the architecture, Comm. is the communicator, PID is the process ID number, MachID is the compute node ID, and HW ID is an identifier specific to the communicator used.

You can also check connectivity of a compute node in the Network Configuration panel byselecting it in the Spawned Compute Nodes list and clicking on the Connectivity button. Ifyou click the Connectivity button without selecting any of the Spawned Compute Nodes,the Parallel Connectivity panel will open, and you can specify the node there, as described

above. If you select more than one of the Spawned Compute Nodes, clicking on theConnectivity button will print connectivity information for each selected node.


Information about grid partitioning is provided in the following sections:

Section 32.5.1: Overview of Grid Partitioning

Section 32.5.2: Partitioning the Grid Automatically

Section 32.5.3: Partitioning the Grid Manually

Section 32.5.4: Grid Partitioning Methods

Section 32.5.5: Checking the Partitions

Section 32.5.6: Load Distribution

32.5.1 Overview of Grid Partitioning

When you use the parallel solver in FLUENT, you need to partition or subdivide the gridinto groups of cells that can be solved on separate processors (see Figure 32.5.1). Youcan either use the automatic partitioning algorithms when reading an unpartitioned gridinto the parallel solver (recommended approach, described in Section 32.5.2: Partitioningthe Grid Automatically), or perform the partitioning yourself in the serial solver or afterreading a mesh into the parallel solver (as described in Section 32.5.3: Partitioning theGrid Manually). In either case, the available partitioning methods are those describedin Section 32.5.4: Grid Partitioning Methods. You can partition the grid before or after



28/62

Parallel Processing

you set up the problem (by defining models, boundary conditions, etc.), although it isbetter to partition after the setup, due to some model dependencies (e.g., adaption onnon-conformal interfaces, sliding-mesh and shell-conduction encapsulation).

i If your case file contains sliding meshes, or non-conformal interfaces on

which you plan to perform adaption during the calculation, you will haveto partition it in the serial solver. See Sections 32.5.2 and 32.5.3 for moreinformation.

Note that the relative distribution of cells among compute nodes will be maintainedduring grid adaption, except if non-conformal interfaces are present, so repartitioningafter adaption is not required. See Section 32.5.6: Load Distribution for more information.

If you use the serial solver to set up the problem before partitioning, the machine onwhich you perform this task must have enough memory to read in the grid. If yourgrid is too large to be read into the serial solver, you can read the unpartitioned griddirectly into the parallel solver (using the memory available in all the defined hosts)

and have it automatically partitioned. In this case you will set up the problem after aninitial partition has been made. You will then be able to manually repartition the caseif necessary. See Sections 32.5.2 and 32.5.3 for additional details and limitations, andSection 32.5.5: Checking the Partitions for details about checking the partitions.

Partition 0 Partition 1After Partitioning

Interface

Boundary

DomainBefore Partitioning

Figure 32.5.1: Partitioning the Grid



29/62


30/62

Parallel Processing

button. It is recommended that you not partition cells zones independently(by turning off the Across Zones check button) unless cells in different zoneswill require significantly different amounts of computation during the solutionphase (e.g., if the domain contains both solid and fluid zones).

(d) If you have chosen the Principal Axes or Cartesian Axes method, you can improve

the partitioning by enabling the automatic testing of the different bisectiondirections before the actual partitioning occurs. To use pretesting, turn onthe Pre-Test option. Pretesting is described in Section 32.5.4: Pretesting.

(e) Click OK.

If you have a case file where you have already partitioned the grid, and the numberof partitions divides evenly into the number of compute nodes, you can keep thedefault selection ofCase File in the Auto Partition Grid panel. This instructs FLUENTto use the partitions in the case file.

2. Read the case file.

File Read Case...

Reporting During Auto Partitioning

As the grid is automatically partitioned, some information about the partitioning processwill be printed in the text (console) window. If you want additional information, you canprint a report from the Partition Grid panel after the partitioning is completed.

Parallel Partition...

When you click the Print Active Partitions or Print Stored Partitions button in the Partition

Grid panel, FLUENT will print the partition ID, number of cells, faces, and interfaces, andthe ratio of interfaces to faces for each active or stored partition in the console window.In addition, it will print the minimum and maximum cell, face, interface, and face-ratio variations. See Section 32.5.5: Interpreting Partition Statistics for details. You canexamine the partitions graphically by following the directions in Section 32.5.5: Checkingthe Partitions.

32.5.3 Partitioning the Grid Manually

Automatic partitioning in the parallel solver (described in Section 32.5.2: Partitioningthe Grid Automatically) is the recommended approach to grid partitioning, but it is

also possible to partition the grid manually in either the serial solver or the parallelsolver. After automatic or manual partitioning, you will be able to inspect the partitionscreated (see Section 32.5.5: Checking the Partitions) and optionally repartition the grid,if necessary. Again, you can do so within the serial or the parallel solver, using thePartition Grid panel. A partitioned grid may also be used in the serial solver without anyloss in performance.



31/62


Guidelines for Partitioning the Grid

The following steps are recommended for partitioning a grid manually:

1. Partition the grid using the default bisection method (Principal Axes) and optimiza-tion (Smooth).

2. Examine the partition statistics, which are described in Section 32.5.5: Interpret-ing Partition Statistics. Your aim is to achieve small values of Interface ratiovariation and Global interface ratio while maintaining a balanced load (Cellvariation). If the statistics are not acceptable, try one of the other bisection meth-ods.

3. Once you determine the best bisection method for your problem, you can turn onPre-Test (see Section 32.5.4: Pretesting) to improve it further, if desired.

4. You can also improve the partitioning using the Merge optimization, if desired.

Instructions for manual partitioning are provided below.

Using the Partition Grid Panel

For grid partitioning, you need to select the bisection method for creating the grid par-titions, set the number of partitions, select the zones and/or registers, and choose theoptimizations to be used. For some methods, you can also perform pretesting to ensurethat the best possible bisection is performed. Once you have set all the parameters in thePartition Grid panel to your satisfaction, click the Partition button to subdivide the grid

into the selected number of partitions using the prescribed method and optimization(s).See above for recommended partitioning strategies.

You can set the relevant inputs in the Partition Grid panel (Figure 32.5.3 in the parallelsolver, or Figure 32.5.4 in the serial solver) in the following manner:

Parallel Partition...

1. Select the bisection method in the Method drop-down list. The choices are thetechniques described in Section 32.5.4: Bisection Methods.

2. Set the desired number of grid partitions in the Number integer number field. Youcan use the counter arrows to increase or decrease the value, instead of typing inthe box. The number of grid partitions must be an integral multiple of the numberof processors available for parallel computing.



32/62

Parallel Processing

Figure 32.5.3: The Partition Grid Panel in the Parallel Solver

Figure 32.5.4: The Partition Grid Panel in the Serial Solver



33/62


3. You can choose to independently apply partitioning to each cell zone, or you canallow partitions to cross zone boundaries using the Across Zones check button. It isrecommended that you not partition cells zones independently (by turning off theAcross Zones check button) unless cells in different zones will require significantlydifferent amounts of computation during the solution phase (e.g., if the domaincontains both solid and fluid zones).

4. You can select Encapsulate Grid Interfaces if you would like the cells surroundingall non-conformal grid interfaces in your mesh to reside in a single partition at alltimes during the calculation. If your case file contains non-conformal interfaceson which you plan to perform adaption during the calculation, you will have topartition it in the serial solver, with the Encapsulate Grid Interfaces and Encapsulatefor Adaption options turned on.

5. If you have enabled the Encapsulate Grid Interfaces option in the serial solver, theEncapsulate for Adaption option will also be available. When you select this op-tion, additional layers of cells are encapsulated such that transfer of cells will be

unnecessary during parallel adaption.

6. You can activate and control the desired optimization methods (described in Sec-tion 32.5.4: Optimizations) using the items under Optimizations. You can activatethe Merge and Smooth schemes by turning on the Do check button next to eachone. For each scheme, you can also set the number of Iterations. Each optimizationscheme will be applied until appropriate criteria are met, or the maximum numberof iterations has been executed. If the Iterations counter is set to 0, the optimizationscheme will be applied until completion, without limit on the maximum number ofiterations.

7. If you have chosen the Principal Axes or Cartesian Axes method, you can improve thepartitioning by enabling the automatic testing of the different bisection directionsbefore the actual partitioning occurs. To use pretesting, turn on the Pre-Test option.Pretesting is described in Section 32.5.4: Pretesting.

8. In the Zones and/or Registers lists, select the zone(s) and/or register(s) for whichyou want to partition. For most cases, you will select all Zones (the default) topartition the entire domain. See below for details.

9. Click the Partition button to partition the grid.

10. If you decide that the new partitions are better than the previous ones (if the gridwas already partitioned), click the Use Stored Partitions button to make the newlystored cell partitions the active cell partitions. The active cell partition is used forthe current calculation, while the stored cell partition (the last partition performed)is used when you save a case file.



34/62

Parallel Processing

11. When using the dynamic mesh model in your parallel simulations, the Partitionpanel includes an Auto Repartition option and a Repartition Interval setting. Theseparallel partitioning options are provided because FLUENT migrates cells whenlocal remeshing and smoothing is performed. Therefore, the partition interface be-comes very wrinkled and the load balance may deteriorate. By default, the AutoRepartition option is selected, where a percentage of interface faces and loads are au-tomatically traced. When this option is selected, FLUENT automatically determinesthe most appropriate repartition interval based on various simulation parameters.Sometimes, using the Auto Repartition option provides insufficient results, therefore,the Repartition Interval setting can be used. The Repartition Interval setting lets youto specify the interval (in time steps or iterations respectively) when a repartitionis enforced. When repartitioning is not desired, then you can set the RepartitionInterval to zero.

i Note that when dynamic meshes and local remeshing is utilized, updatedmeshes may be slightly different in parallel FLUENT (when compared to

serial FLUENTor when compared to a parallel solution created with a dif-ferent number of compute nodes), resulting in very small differences in thesolutions.

Partitioning Within Zones or Registers

The ability to restrict partitioning to cell zones or registers gives you the flexibility toapply different partitioning strategies to subregions of a domain. For example, if yourgeometry consists of a cylindrical plenum connected to a rectangular duct, you maywant to partition the plenum using the Cylindrical Axes method, and the duct using theCartesian Axes method.

If the plenum and the duct are contained in two different cell zones, you can select oneat a time and perform the desired partitioning, as described in Section 32.5.3: Using thePartition Grid Panel. If they are not in two different cell zones, you can create a cell register(basically a list of cells) for each region using the functions that are used to mark cells foradaption. These functions allow you to mark cells based on physical location, cell volume,gradient or isovalue of a particular variable, and other parameters. See Chapter 27: GridAdaption for information about marking cells for adaption. Section 27.11.1: ManipulatingAdaption Registers provides information about manipulating different registers to createnew ones. Once you have created a register, you can partition within it as describedabove.

i Note that partitioning within zones or registers is not available when Metisis selected as the partition Method.

For dynamic mesh applications (see item 11 above), FLUENT stores the partition methodused to partition the respective zone. Therefore, if repartitioning is done, FLUENT usesthe same method that was used to partition the mesh.

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-


35/62


Reporting During Partitioning

As the grid is partitioned, information about the partitioning process will be printed inthe text (console) window. By default, the solver will print the number of partitionscreated, the number of bisections performed, the time required for the partitioning, andthe minimum and maximum cell, face, interface, and face-ratio variations. (See Sec-

tion 32.5.5: Interpreting Partition Statistics for details.) If you increase the Verbosity to2 from the default value of 1, the partition method used, the partition ID, number ofcells, faces, and interfaces, and the ratio of interfaces to faces for each partition will alsobe printed in the console window. If you decrease the Verbosity to 0, only the number ofpartitions created and the time required for the partitioning will be reported.

You can request a portion of this report to be printed again after the partitioning iscompleted. When you click the Print Active Partitions or Print Stored Partitions buttonin the parallel solver, FLUENT will print the partition ID, number of cells, faces, andinterfaces, and the ratio of interfaces to faces for each active or stored partition in theconsole window. In addition, it will print the minimum and maximum cell, face, interface,

and face-ratio variations. In the serial solver, you will obtain the same information aboutthe stored partition when you click Print Partitions. See Section 32.5.5: InterpretingPartition Statistics for details.

i Recall that to make the stored cell partitions the active cell partitions youmust click the Use Stored Partitions button. The active cell partition isused for the current calculation, while the stored cell partition (the lastpartition performed) is used when you save a case file.

Resetting the Partition Parameters

If you change your mind about your partition parameter settings, you can easily returnto the default settings assigned by FLUENT by clicking on the Default button. When youclick the Default button, it will become the Reset button. The Reset button allows youto return to the most recently saved settings (i.e., the values that were set before youclicked on Default). After execution, the Reset button will become the Default buttonagain.

32.5.4 Grid Partitioning Methods

Partitioning the grid for parallel processing has three major goals:

Create partitions with equal numbers of cells.

Minimize the number of partition interfacesi.e., decrease partition boundary sur-face area.

Minimize the number of partition neighbors.



36/62

Parallel Processing

Balancing the partitions (equalizing the number of cells) ensures that each processorhas an equal load and that the partitions will be ready to communicate at about thesame time. Since communication between partitions can be a relatively time-consumingprocess, minimizing the number of interfaces can reduce the time associated with thisdata interchange. Minimizing the number of partition neighbors reduces the chancesfor network and routing contentions. In addition, minimizing partition neighbors isimportant on machines where the cost of initiating message passing is expensive comparedto the cost of sending longer messages. This is especially true for workstations connectedin a network.

The partitioning schemes in FLUENT use bisection algorithms to create the partitions, butunlike other schemes which require the number of partitions to be a factor of two, theseschemes have no limitations on the number of partitions. For each available processor,you will create the same number of partitions (i.e., the total number of partitions will bean integral multiple of the number of processors).

Bisection MethodsThe grid is partitioned using a bisection algorithm. The selected algorithm is applied tothe parent domain, and then recursively applied to the child subdomains. For example,to divide the grid into four partitions, the solver will bisect the entire (parent) domaininto two child domains, and then repeat the bisection for each of the child domains,yielding four partitions in total. To divide the grid into three partitions, the solver willbisect the parent domain to create two partitionsone approximately twice as largeas the otherand then bisect the larger child domain again to create three partitions intotal.

The grid can be partitioned using one of the algorithms listed below. The most efficient

choice is problem-dependent, so you can try different methods until you find the one thatis best for your problem. See Section 32.5.3: Guidelines for Partitioning the Grid forrecommended partitioning strategies.

Cartesian Axes bisects the domain based on the Cartesian coordinates of the cells (seeFigure 32.5.5). It bisects the parent domain and all subsequent child subdomainsperpendicular to the coordinate direction with the longest extent of the activedomain. It is often referred to as coordinate bisection.

Cartesian Strip uses coordinate bisection but restricts all bisections to the Cartesian

direction of longest extent of the parent domain (see Figure 32.5.6). You can oftenminimize the number of partition neighbors using this approach.

Cartesian X-, Y-, Z-Coordinate bisects the domain based on the selected Cartesiancoordinate. It bisects the parent domain and all subsequent child subdomainsperpendicular to the specified coordinate direction. (See Figure 32.5.6.)



37/62


Cartesian R Axes bisects the domain based on the shortest radial distance from thecell centers to that Cartesian axis (x, y, or z) which produces the smallest interfacesize. This method is available only in 3D.

Cartesian RX-, RY-, RZ-Coordinate bisects the domain based on the shortest ra-dial distance from the cell centers to the selected Cartesian axis (x, y, or z). These

methods are available only in 3D.

Cylindrical Axes bisects the domain based on the cylindrical coordinates of the cells.This method is available only in 3D.

Cylindrical R-, Theta-, Z-Coordinate bisects the domain based on the selected cylin-drical coordinate. These methods are available only in 3D.

Metis uses the METIS software package for partitioning irregular graphs, developed byKarypis and Kumar at the University of Minnesota and the Army HPC ResearchCenter. It uses a multilevel approach in which the vertices and edges on the fine

graph are coalesced to form a coarse graph. The coarse graph is partitioned, andthen uncoarsened back to the original graph. During coarsening and uncoarsen-ing, algorithms are applied to permit high-quality partitions. Detailed informationabout METIS can be found in its manual [161].

i Note that when using the socket version (-pnet), the METIS partitioneris not available. In this case, METIS partitioning can be obtained usingthe partition filter, as described below.

Polar Axes bisects the domain based on the polar coordinates of the cells (see Fig-ure 32.5.9). This method is available only in 2D.

Polar R-Coordinate, Polar Theta-Coordinate bisects the domain based on the se-lected polar coordinate (see Figure 32.5.9). These methods are available only in2D.

Principal Axes bisects the domain based on a coordinate frame aligned with the prin-cipal axes of the domain (see Figure 32.5.7). This reduces to Cartesian bisectionwhen the principal axes are aligned with the Cartesian axes. The algorithm is alsoreferred to as moment, inertial, or moment-of-inertia partitioning.

This is the default bisection method in FLUENT.

Principal Strip uses moment bisection but restricts all bisections to the principal axisof longest extent of the parent domain (see Figure 32.5.8). You can often minimizethe number of partition neighbors using this approach.

Principal X-, Y-, Z-Coordinate bisects the domain based on the selected principalcoordinate (see Figure 32.5.8).

http://-/?-


38/62

Parallel Processing

Spherical Axes bisects the domain based on the spherical coordinates of the cells. Thismethod is available only in 3D.

Spherical Rho-, Theta-, Phi-Coordinate bisects the domain based on the selectedspherical coordinate. These methods are available only in 3D.

Contours of Cell Partition

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 32.5.5: Partitions Created with the Cartesian Axes Method

OptimizationsAdditional optimizations can be applied to improve the quality of the grid partitions.The heuristic of bisecting perpendicular to the direction of longest domain extent isnot always the best choice for creating the smallest interface boundary. A pre-testingoperation (see Section 32.5.4: Pretesting) can be applied to automatically choose the bestdirection before partitioning. In addition, the following iterative optimization schemesexist:

Smooth attempts to minimize the number of partition interfaces by swapping cellsbetween partitions. The scheme traverses the partition boundary and gives cells to

the neighboring partition if the interface boundary surface area is decreased. (SeeFigure 32.5.10.)

Merge attempts to eliminate orphan clusters from each partition. An orphan cluster isa group of cells with the common feature that each cell within the group has at leastone face which coincides with an interface boundary. (See Figure 32.5.11.) Orphanclusters can degrade multigrid performance and lead to large communication costs.



39/62



3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 32.5.6: Partitions Created with the Cartesian Strip or Cartesian X-Coordinate Method


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 32.5.7: Partitions Created with the Principal Axes Method



40/62

Parallel Processing


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 32.5.8: Partitions Created with the Principal Strip or Principal X-Coordinate Method


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 32.5.9: Partitions Created with the Polar Axes or Polar Theta-Coordinate Method



41/62


Figure 32.5.10: The Smooth Optimization Scheme

Figure 32.5.11: The Merge Optimization Scheme



42/62

Parallel Processing

In general, the Smooth and Merge schemes are relatively inexpensive optimization tools.

Pretesting

If you choose the Principal Axes or Cartesian Axes method, you can improve the bisectionby testing different directions before performing the actual bisection. If you choose notto use pretesting (the default), FLUENT will perform the bisection perpendicular to thedirection of longest domain extent.

If pretesting is enabled, it will occur automatically when you click the Partition buttonin the Partition Grid panel, or when you read in the grid if you are using automaticpartitioning. The bisection algorithm will test all coordinate directions and choose theone which yields the fewest partition interfaces for the final bisection.

Note that using pretesting will increase the time required for partitioning. For 2D prob-lems partitioning will take 3 times as long as without pretesting, and for 3D problems itwill take 4 times as long.

Using the Partition Filter

As noted above, you can use the METIS partitioning method through a filter in ad-dition to within the Auto Partition Grid and Partition Grid panels. To perform METISpartitioning on an unpartitioned grid, use the File/Import/Partition/Metis... menu item.

File Import Partition Metis...

FLUENT will use the METIS partitioner to partition the grid, and then read the par-titioned grid into the solver. The number of partitions will be equal to the number ofprocesses. You can then proceed with the model definition and solution.

i Direct import to the parallel solver through the partition filter requiresthat the host machine has enough memory to run the filter for the specifiedgrid. If not, you will need to run the filter on a machine that does haveenough memory. You can either start the parallel solver on the machinewith enough memory and repeat the process described above, or run thefilter manually on the new machine and then read the partitioned grid intothe parallel solver on the host machine.

To manually partition a grid using the partition filter, enter the following command:

utility partition input-filename partition-count output-filename

where input-filename is the filename for the grid to be partitioned, partition-count isthe number of partitions desired, and output-filename is the filename for the parti-tioned grid. You can then read the partitioned grid into the solver (using the standardFile/Read/Case... menu item) and proceed with the model definition and solution.



43/62


When the File/Import/Partition/Metis... menu item is used to import an unpartitionedgrid into the parallel solver, the METIS partitioner partitions the entire grid. You mayalso partition each cell zone individually, using the File/Import/Partition/Metis Zone...menu item.

File Import Partition Metis Zone...

This method can be useful for balancing the work load. For example, if a case has afluid zone and a solid zone, the computation in the fluid zone is more expensive than inthe solid zone, so partitioning each zone individually will result in a more balanced workload.

32.5.5 Checking the Partitions

After partitioning a grid, you should check the partition information and examine thepartitions graphically.

Interpreting Partition Statistics

You can request a report to be printed after partitioning (either automatic or manual) iscompleted. In the parallel solver, click the Print Active Partitions or Print Stored Partitionsbutton in the Partition Grid panel. In the serial solver, click the Print Partitions button.

FLUENT distinguishes between two cell partition schemes within a parallel problem: theactive cell partition and the stored cell partition. Initially, both are set to the cell partitionthat was established upon reading the case file. If you re-partition the grid using thePartition Grid panel, the new partition will be referred to as the stored cell partition. Tomake it the active cell partition, you need to click the Use Stored Partitions button in the

Partition Grid panel. The active cell partition is used for the current calculation, while thestored cell partition (the last partition performed) is used when you save a case file. Thisdistinction is made mainly to allow you to partition a case on one machine or networkof machines and solve it on a different one. Thanks to the two separate partitioningschemes, you could use the parallel solver with a certain number of compute nodes tosubdivide a grid into an arbitrary different number of partitions, suitable for a differentparallel machine, save the case file, and then load it into the designated machine.

When you click Print Partitions in the serial solver, you will obtain information about thestored partition.

The output generated by the partitioning process includes information about the recursive

subdivision and iterative optimization processes. This is followed by information aboutthe final partitioned grid, including the partition ID, number of cells, number of faces,number of interface faces, ratio of interface faces to faces for each partition, numberof neighboring partitions, and cell, face, interface, neighbor, mean cell, face ratio, andglobal face ratio variations. Global face ratio variations are the minimum and maximumvalues of the respective quantities in the present partitions. For example, in the sample



44/62

Parallel Processing

output below, partitions 0 and 3 have the minimum number of interface faces (10), andpartitions 1 and 2 have the maximum number of interface faces (19); hence the variationis 1019.

Your aim is to achieve small values of Interface ratio variation and Global interfaceratio while maintaining a balanced load (Cell variation).

>> Partitions:

P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors

0 134 10 0.075 217 10 0.046 1

1 137 19 0.139 222 19 0.086 2

2 134 19 0.142 218 19 0.087 2

3 137 10 0.073 223 10 0.045 1

------

Partition count = 4

Cell variation = (134 - 137)

Mean cell variation = ( -1.1% - 1.1%)

Intercell variation = (10 - 19)

Intercell ratio variation = ( 7.3% - 14.2%)

Global intercell ratio = 10.7%

Face variation = (217 - 223)

Interface variation = (10 - 19)

Interface ratio variation = ( 4.5% - 8.7%)

Global interface ratio = 3.4%

Neighbor variation = (1 - 2)

Computing connected regions; type ^C to interrupt.

Connected region count = 4

Note that partition IDs correspond directly to compute node IDs when a case file is readinto the parallel solver. When the number of partitions in a case file is larger than thenumber of compute nodes, but is evenly divisible by the number of compute nodes, thenthe distribution is such that partitions with IDs 0 to (M 1) are mapped onto computenode 0, partitions with IDs M to (2M 1) onto compute node 1, etc., where M is equalto the ratio of the number of partitions to the number of compute nodes.



45/62


Examining Partitions Graphically

To further aid interpretation of the partition information, you can draw contours of thegrid partitions, as illustrated in Figures 32.5.532.5.9.

Display Contours...

To display the active cell partition or the stored cell partition (which are described above),select Active Cell Partition or Stored Cell Partition in the Cell Info... category of the ContoursOfdrop-down list, and turn off the display ofNode Values. (See Section 29.1.2: DisplayingContours and Profiles for information about displaying contours.)

i If you have not already done so in the setup of your problem, you will needto perform a solution initialization in order to use the Contours panel.

32.5.6 Load Distribution

If the speeds of the processors that will be used for a parallel calculation differ signifi-cantly, you can specify a load distribution for partitioning, using the load-distributiontext command.

parallel partition set load-distribution

For example, if you will be solving on three compute nodes, and one machine is twice asfast as the other two, then you may want to assign twice as many cells to the first machineas to the others (i.e., a load vector of (2 1 1)). During subsequent grid partitioning,partition 0 will end up with twice as many cells as partitions 1 and 2.

Note that for this example, you would then need to start up FLUENT such that computenode 0 is the fast machine, since partition 0, with twice as many cells as the others, willbe mapped onto compute node 0. Alternatively, in this situation, you could enable theload balancing feature (described in Section 32.6.3: Load Balancing) to have FLUENTautomatically attempt to discern any difference in load among the compute nodes.

i If you adapt a grid that contains non-conformal interfaces, and you wantto rebalance the load on the compute nodes, you will have to save your caseand data files after adaption, read the case and data files into the serialsolver, repartition using the Encapsulate Grid Interfaces and Encapsulate forAdaption options in the Partition Grid panel, and save case and data filesagain. You will then be able to read the manually repartitioned case and

data files into the parallel solver, and continue the solution from where youleft it.

http://-/?-http://-/?-http://-/?-


46/62

Parallel Processing

32.6 Checking and Improving Parallel Performance

To determine how well the parallel solver is working, you can measure computation andcommunication times, and the overall parallel efficiency, using the performance meter.You can also control the amount of communication between compute nodes in order tooptimize the parallel solver, and take advantage of the automatic load balancing feature

of FLUENT.

32.6.1 Checking Parallel Performance

The performance meter allows you to report the wall clock time elapsed during a com-putation, as well as message-passing statistics. Since the performance meter is alwaysactivated, you can access the statistics by printing them after the computation is com-pleted. To view the current statistics, use the Parallel/Timer/Usage menu item.

Parallel Timer Usage

Performance statistics will be printed in the text window (console).To clear the performance meter so that you can eliminate past statistics from the futurereport, use the Parallel/Timer/Reset menu item.

Parallel Timer Reset

32.6.2 Improving Input/Output Speed

By default, FLUENT reads in and automatically distributes the complete domain overthe entire network of compute nodes, increasing the speed of your parallel processes.

If the host machine has sufficient memory, you can slightly improve the parallel perfor-mance using the text command interface (TUI).

parallel set fast-io?

The fast-io? command allows you to still maintain the same benefits of speed. However,the complete domain is read on the host machine first and then distributed, thus requiringthe host machine to have sufficient memory.

32.6.3 Optimizing the Parallel Solver

Increasing the Report Interval

In FLUENT, you can reduce communication and improve parallel performance by increas-ing the report interval for residual printing/plotting or other solution monitoring reports.You can modify the value for Reporting Interval in the Iterate panel.

Solve Iterate...



47/62

32.6 Checking and Improving Parallel Performance

i Note that you will be unable to interrupt iterations until the end of eachreport interval.

Load Balancing

A dynamic load balancing capability is available in FLUENT. The principal reason forusing parallel processing is to reduce the turnaround time of your simulation, ideallyby a factor proportional to the collective speed of the computing resources used. If, forexample, you were using four CPUs to solve your problem, then you would expect toreduce the turnaround time by a factor of four. This is of course the ideal situation, andassumes that there is very little communication needed among the CPUs, that the CPUsare all of equal speed, and that the CPUs are dedicated to your job. In practice, this isoften not the case. For example, CPU speeds can vary if you are solving in parallel on aheterogeneous collection of workstations, other jobs may be competing for use of one ormore of the CPUs, and network traffic either from within the parallel solver or generatedfrom external sources may delay some of the necessary communication among the CPUs.

If you enable dynamic load balancing in FLUENT, the load across the computational andnetworking resources will be monitored periodically. If the load balancer determines thatperformance can be improved by redistributing the cells among the compute nodes, itwill automatically do so. There is a time penalty associated with load balancing itself,and so it is disabled by default. If you will be using a dedicated homogeneous resource,or if you are using a heterogeneous resource but have accounted for differences in CPUspeeds during partitioning by specifying a load distribution (see Section 32.5.6: LoadDistribution), then you may not need to use load balancing.

i Note that when the shell conduction model is used, you will not be able toturn on load balancing.

To enable and control FLUENTs automatic load balancing feature, use the Load Balancepanel (Figure 32.6.1). Load balancing will automatically detect and analyze parallelperformance, and redistribute cells between the existing compute nodes to optimize it.

Parallel Load Balance...

The procedure for using load balancing is as follows:

1. Turn on the Load Balancing option.

2. Select the bisection method to create new grid partitions in the Partition Methoddrop-down list. The choices are the techniques described in Section 32.5.4: BisectionMethods. As part of the automatic load balancing procedure, the grid will berepartitioned into several small partitions using the specified method. The resultingpartitions will then be distributed among the compute nodes to achieve a morebalanced load.



48/62

Parallel Processing

Figure 32.6.1: The Load Balance Panel

3. Specify the desired Balance Interval. When a value of 0 is specified, FLUENT willinternally determine the best value to use, initially using an interval of 25 iterations.You can override this behavior by specifying a non-zero value. FLUENT will thenattempt to perform load balancing after every N iterations, where N is the specifiedBalance Interval. You should be careful to select an interval that is large enough tooutweigh the cost of performing the load balancing operations.

Note that you can interrupt the calculation at any time, turn the load balancing featureoff (or on), and then continue the calculation.

i If problems arise in your computations due to adaption, you can turn offthe automatic load balancing, which occurs any time that mesh adaptionis performed in parallel.

To instruct the solver to skip the load balancing step, issue the following command:

(disable-load-balance-after-adaption)

To return to the default behavior use the following command:

(enable-load-balance-after-ad

Documents

More Details on Fluent