1
Methodology Lattice Boltzmann method is based on the Boltzmann transport equation Domain is discretized with lattice nodes instead of rigorous meshing Independence from mesh allows for complex domains like porous media Masses at nodes collide and then stream information to neighbors Implementation of Parallel Computing for Multiphase Flows using the Lattice Boltzmann Method Jaime Mudrich (DOE Fellow), Rinaldo G. Galdamez (DOE Fellow), and Seckin Gokaltun, Ph.D. Applied Research Center, Florida International University, Miami, FL Validation of the Parallel LBM Code 0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 2500 40000 160000 640000 Iterative Computation Time (s) Domain Size (Nodes) Subroutine Time Profile Prestream Hydrodynamics Poststream Stats 53 million gallons of radioactive waste at Hanford site Stored in leaking single shell tanks (SST) Double shell tanks (DST) introduced in 1968 Unlike the SSTs, DSTs show no leaking Waste is being transported from SSTs to DSTs Transport of heterogeneous waste clogs piping Pulsed-air mixing used to “stir” heterogeneous material LBM simulates bubbles rising to predict mixing Rising bubbles mix slurry SST (A), and DST (B) Master processor divides the problem domain amongst multiple slaves Message passing interface (MPI) allows CPUs to bridge information across sub domains Reduction of processing time is ultimately limited by communications between processors and the components of the program that must run sequentially Effectiveness of parallelization is measured by speedup, S(N), for N processors When increasing the number of CPUs shows minimal performance increase, optimal quantity has been reached In the serial configuration, only one processor is used to solve the entire domain of the problem In the parallel configuration, multiple processors split the domain of the problem, reducing overall computation time Processors communicate with their neighbors through MPI to “patch” the sub domains Finally, the master collects the results from the various slave nodes ) ( ) 1 ( ) ( N T T N S Amdahl’s Law : T(1) = Single processor computation time T(N) = Multiple processor computation time Acknowledgements Results f 1 f 2 f 3 f 4 f 5 f 6 f 7 f 8 f 0 Histogram view of the distribution function, f. t f t f t f t f eq a a a t a , , , , x x x x 2 2 4 2 2 2 3 2 9 3 1 ) ( c c c w f a eq a u u e u e x x a a …where… ) , ( ) , ( t f t t t f t a a x e x a Stream x = position of particles u = macroscopic velocity at the node a w = constant, direction-specific weight c = model speed of sound = dimensionless relaxation time a e = basis vector at node (9 total) Lattice nodes Overlapping profiles for serial and parallel case indicates accurate results for parallel code 0.00E+00 1.00E+01 2.00E+01 3.00E+01 4.00E+01 5.00E+01 6.00E+01 7.00E+01 0 20 40 60 80 Speed Up Number of Processors Speed Up for Parallelized Subroutines Prestream Hydrodynamics Poststream Ideal Speedup trends. Near-linear behavior confirms correct parallelization For 640,000 nodes, the parallel code reduces the job from thirty hours to only three hours 0.00E+00 5.00E-03 1.00E-02 1.50E-02 2.00E-02 2.50E-02 3.00E-02 3.50E-02 0 10 20 30 40 50 60 70 Computation Time (s) Number of Processors Prestream Subroutine parallel time profiles; The computation times all converge at about N = 25, representing the optimal quantity 0.00E+00 5.00E-04 1.00E-03 1.50E-03 2.00E-03 2.50E-03 3.00E-03 0 10 20 30 40 50 60 70 Computation Time (s) Number of Processors Hydrodynamics 0.00E+00 5.00E-03 1.00E-02 1.50E-02 2.00E-02 2.50E-02 3.00E-02 3.50E-02 0 10 20 30 40 50 60 70 Computation Time (s) Number of Processors PostStream 0 20000 40000 60000 80000 100000 120000 0 200000 400000 600000 800000 Processing Time (s) Domain Size (nodes) Time Response of Solver to Domain Size Serial Parallel (25 CPUs) 0.999955 0.999960 0.999965 0.999970 0.999975 0.999980 0.999985 0.999990 0.999995 1.000000 1.000005 0 50 100 150 200 250 Pressure Position Cross-sectional Pressure Profile Serial Parallel Max Error = 0.27% Collision Introduction Parallel Processing Background Time Profile for Serial LBM Code For the multiphase simulations that are being studied, the iterative algorithm is comprised of three steps Diagnostics performed to identify sluggish areas “Hydrodynamics”, “Prestream”, and “Poststream” will benefit the most from parallelization These subroutines will be split amongst various processors to share the load, speeding up the solution Conclusions and Future Work Parallelization with the optimal number of processors results in significant savings in computer time (10 times for N=25 and 640,000 lattice nodes) Parallelization allows for simulation of larger domains or longer times Future work will include extension of the code from 2D to 3D In addition, fluid-solid interactions will be also implemented This research was supported by the U.S. Department of Energy through the DOE-FIU Science and Technology Workforce Development Program, under grant No. DE-EM0000598. Special thanks to Leonel Lagos, Ph.D., PMP ®, Director of the DOE-FIU Science and Technology Workforce Development Program Study performed on static bubble with a density ratio of 1,000 and a uniform initial pressure distribution The solution was checked against Laplace’s law for surface tension described below R P Laplace’s Law : P = Surface tension = Pressure difference across fluid interface = Radius of bubble R Parallelization allows for more simulations to performed in a much shorter time Using the parallel code and the experimentally determined optimal quantity of processors (N = 25) the following simulation was performed This series illustrates a case of three equal radius bubbles with minimal separation. LBM captures the coalescence of the top bubbles. Density ratio = 100, Interface width = 5 lattice units, vertical acceleration = -2.0 x 10 -7 lattice units per lattice time squared, Interfacial tension = 0.1, and relaxation time for both fluids = 2.71 x 10 -2 T = 0 T = 10 T = 15 T = 20 T = 25 T = 100 T = 50 T = 75

Implementation of Parallel Computing for Multiphase Flows ... · Implementation of Parallel Computing for Multiphase Flows using the Lattice Boltzmann Method Jaime Mudrich (DOE Fellow),

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implementation of Parallel Computing for Multiphase Flows ... · Implementation of Parallel Computing for Multiphase Flows using the Lattice Boltzmann Method Jaime Mudrich (DOE Fellow),

Methodology

• Lattice Boltzmann method is based on the Boltzmann transport equation

• Domain is discretized with lattice nodes instead of rigorous meshing

• Independence from mesh allows for complex domains like porous media

• Masses at nodes collide and then stream information to neighbors

Implementation of Parallel Computing for Multiphase Flows using the Lattice Boltzmann Method

Jaime Mudrich (DOE Fellow), Rinaldo G. Galdamez (DOE Fellow), and Seckin Gokaltun, Ph.D. Applied Research Center, Florida International University, Miami, FL

Validation of the Parallel LBM Code

0.0000

0.2000

0.4000

0.6000

0.8000

1.0000

1.2000

2500 40000 160000 640000

Ite

rati

ve C

om

pu

tati

on

Tim

e (

s)

Domain Size (Nodes)

Subroutine Time Profile

Prestream

Hydrodynamics

Poststream

Stats

• 53 million gallons of radioactive waste at Hanford site

• Stored in leaking single shell tanks (SST)

• Double shell tanks (DST) introduced in 1968

• Unlike the SSTs, DSTs show no leaking

• Waste is being transported from SSTs to DSTs

• Transport of heterogeneous waste clogs piping

• Pulsed-air mixing used to “stir” heterogeneous material

• LBM simulates bubbles rising to predict mixing

Rising bubbles mix slurry

SST (A), and DST (B)

• Master processor divides the problem domain amongst multiple slaves

• Message passing interface (MPI) allows CPUs to bridge information across sub domains

• Reduction of processing time is ultimately limited by communications between processors

and the components of the program that must run sequentially

• Effectiveness of parallelization is measured by speedup, S(N), for N processors

• When increasing the number of CPUs shows minimal performance increase, optimal

quantity has been reached

In the serial configuration, only one processor is used to solve the entire domain of the problem

In the parallel configuration, multiple processors split the domain of the problem, reducing overall computation time

Processors communicate with their neighbors through MPI to “patch” the sub domains

Finally, the master collects the results from the various slave nodes

)(

)1()(

NT

TNS Amdahl’s Law :

T(1) = Single processor computation time T(N) = Multiple processor computation time

Acknowledgements

Results

f1

f2

f3

f4

f5 f6

f7 f8

f0

Histogram view of the distribution function, f.

tftftftf

eq

aaa

t

a

,,,,

xxxx

2

2

4

2

2 2

3

2

931)(

cccwf a

eq

a

uueuexx aa

…where…

),(),( tftttf t

aa xex a

Stream

x = position of particles u = macroscopic velocity at the node

aw = constant, direction-specific weight

c = model speed of sound = dimensionless relaxation time

ae = basis vector at node (9 total)

Lattice nodes

Overlapping profiles for serial and parallel case indicates accurate results for parallel code

0.00E+00

1.00E+01

2.00E+01

3.00E+01

4.00E+01

5.00E+01

6.00E+01

7.00E+01

0 20 40 60 80

Spe

ed

Up

Number of Processors

Speed Up for Parallelized Subroutines

Prestream

Hydrodynamics

Poststream

Ideal

Speedup trends. Near-linear behavior confirms correct parallelization

For 640,000 nodes, the parallel code reduces the job from thirty hours to only three hours

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

2.50E-02

3.00E-02

3.50E-02

0 10 20 30 40 50 60 70

Co

mp

uta

tio

n T

ime

(s)

Number of Processors

Prestream

Subroutine parallel time profiles; The computation times all converge at about N = 25, representing the optimal quantity

0.00E+00

5.00E-04

1.00E-03

1.50E-03

2.00E-03

2.50E-03

3.00E-03

0 10 20 30 40 50 60 70

Co

mp

uta

tio

n T

ime

(s)

Number of Processors

Hydrodynamics

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

2.50E-02

3.00E-02

3.50E-02

0 10 20 30 40 50 60 70

Co

mp

uta

tio

n T

ime

(s)

Number of Processors

PostStream

0

20000

40000

60000

80000

100000

120000

0 200000 400000 600000 800000

Pro

cess

ing

Tim

e (

s)

Domain Size (nodes)

Time Response of Solver to Domain Size

Serial

Parallel (25 CPUs)

0.999955

0.999960

0.999965

0.999970

0.999975

0.999980

0.999985

0.999990

0.999995

1.000000

1.000005

0 50 100 150 200 250

Pre

ssu

re

Position

Cross-sectional Pressure Profile

Serial

Parallel

Max Error = 0.27%

Collision

Introduction Parallel Processing Background

Time Profile for Serial LBM Code

• For the multiphase simulations that are being studied,

the iterative algorithm is comprised of three steps

• Diagnostics performed to identify sluggish areas

• “Hydrodynamics”, “Prestream”, and “Poststream” will

benefit the most from parallelization

• These subroutines will be split amongst various

processors to share the load, speeding up the solution

Conclusions and Future Work

• Parallelization with the optimal number of processors results in significant savings in

computer time (10 times for N=25 and 640,000 lattice nodes)

• Parallelization allows for simulation of larger domains or longer times

• Future work will include extension of the code from 2D to 3D

• In addition, fluid-solid interactions will be also implemented

This research was supported by the U.S. Department of Energy through the DOE-FIU Science and Technology Workforce Development Program, under grant No. DE-EM0000598. Special thanks to Leonel Lagos, Ph.D., PMP ®, Director of the DOE-FIU Science and Technology Workforce Development Program

• Study performed on static bubble with a density ratio of 1,000 and a uniform initial

pressure distribution

• The solution was checked against Laplace’s law for surface tension described below

RP

Laplace’s Law :

P

= Surface tension

= Pressure difference across fluid interface

= Radius of bubble R

• Parallelization allows for more simulations to performed in a much shorter time

• Using the parallel code and the experimentally determined optimal quantity of

processors (N = 25) the following simulation was performed

This series illustrates a case of three equal radius bubbles with minimal separation. LBM captures the coalescence of the top bubbles. Density ratio = 100, Interface width = 5 lattice units, vertical acceleration = -2.0 x 10-7 lattice units per lattice time squared, Interfacial tension = 0.1, and relaxation time for both fluids = 2.71 x 10-2

T = 0 T = 10 T = 15 T = 20

T = 25 T = 100 T = 50 T = 75