20
Lattice Boltzmann algorithm in Lattice Boltzmann algorithm in 3D with MPI Faisal Shahzad 27-01-2011

Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Lattice Boltzmann algorithm in Lattice Boltzmann algorithm in

3D with MPI

Faisal Shahzad

27-01-2011

Page 2: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

• A Quick overview of 1st presentation (OpenMP implementation)

– LBM

– Performance Modeling

– Performance achieved

Overview

– Performance achieved

• MPI Implementation for LBM3D

– Different MPI program approaches, Program flow

– What to communicate between processes

– How communication is done

• Performance measurements for LBM3D_MPI

– Performance graphs

– Scalability Results

27/01/2011 2MuCoSim WS10: LBM in 3D with MPI

Page 3: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Lattice Boltzmann theory

• Domain Discretization: Time and space (D3Q19 Lattice)

– Fluid particles are positioned in certain lattice sites

– May move only in certain, fixed directions

• Distribution functions defines the probability of a movement in a

certain direction

D3Q15 D3Q19

certain direction

• Discretized form of Time and space

27/01/2011 3MuCoSim WS10: LBM in 3D with MPI

Page 4: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Lattice Boltzmann theory

• Stream step

- Pull Scheme

- Push Scheme

• Collide step

- New distribution functions

• Boundary Conditions:• Boundary Conditions:

- No slip (Real Wall with friction)

- Free slip ( Symmetry boundary condition )

- Moving no slip (Movement of wall involving friction => induces flow)

27/01/2011 4MuCoSim WS10: LBM in 3D with MPI

Page 5: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Implementation of LBM

• Two types of data layouts

• Assume domain >> cache

Arrays of Structures Structure of Arrays

• Collision optimized • Streaming/propagation optimized• Collision optimized

Cell 0

Cell 1.

.

.

.

Cell N

• While colliding, all data in one cache line

• While streaming:

Accessing 19 neighbors

• Streaming/propagation optimized

f0

f1.

.

.

.

f19

• While colliding, 19 cache lines are loaded and

all elements are used for collision in the next

cells

ffff0 0 0 0 ffff1111 ffff2 ………. 2 ………. 2 ………. 2 ………. ffff19 19 19 19

ffff0 0 0 0 ffff1111 ffff2 ………. 2 ………. 2 ………. 2 ………. ffff19 19 19 19

ffff0 0 0 0 ffff1111 ffff2 ………. 2 ………. 2 ………. 2 ………. ffff19 19 19 19

cellcellcellcell0 0 0 0 cellcellcellcell1111 cellcellcellcell2 ………2 ………2 ………2 ………cellcellcellcellNNNN

cellcellcellcell0 0 0 0 cellcellcellcell1111 cellcellcellcell2 ………2 ………2 ………2 ………cellcellcellcellNNNN

cellcellcellcell0 0 0 0 cellcellcellcell1111 cellcellcellcell2 ………2 ………2 ………2 ………cellcellcellcellNNNN

Accessing 19 neighbors

Data is evicted from cache before full

cacheline usage

• Performance less than expected from

memory bandwidth

• While streaming:

19 cache lines are loaded (+ RFO); all

elements are used for streaming in the next

cells

• Performance better in memory than AoS

27/01/2011 5MuCoSim WS10: LBM in 3D with MPI

Page 6: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance modeling

PC

PC

C

PC

PC

C

PCC

PCC

PCC

PCC

C

PCC

PCC

PCC

PCC

C

Peak performance Peak Memory BandWidthSTREAM (Scale)

BandWidth

STREAM (Scale)

BandWidth-RFO

Chipset

MemoryMemoryMemoryMemory

C

MI

MemoryMemoryMemoryMemory

C

MI

MemoryMemoryMemoryMemory

C

Xeon 5160 dual socket

(Woody Node)

Xeon 5550 dual socket

(TinyBlue Node)

Xeon 5650 dual socket

(Lima Node)

Peak performance

(Gflops)

Peak Memory BandWidth

(GB/s)BandWidth

(GB/s)

BandWidth-RFO

(GB/s)

Woody 48 21 7 4

TinyBlue 85 63 37 27

Lima 127 63 40 26

27/01/2011 6MuCoSim WS10: LBM in 3D with MPI

Page 7: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance modeling

Performance Estimates:

1 lup : ≈ 250 Flops , 19*8*2 bytes, 19*8*3 bytes(RFO)

Peak performance based on Peak

Flops (MLUPs)

Peak performance based on

Peak Memory BW (MLUPs)

Performance based on

STREAM BW RFO (MLUPs)

Woody 208 46 15

TinyBlue 370 140 88

Lima 555 140 88

Maximum achieved

Maximum achieved performance on one node:

27/01/2011 7MuCoSim WS10: LBM in 3D with MPI

Maximum achieved

performance Open

MP(MLUPs)

Woody Node 14

TinyBlue Node 67

Lima Node 75

Page 8: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation

• MPI

– Single program multiple data(SPMD)– Single program multiple data(SPMD)� Each process runs same program with different data

– Two approaches

• Master-Worker

• All Workers

27/01/2011 MuCoSim WS10: LBM in 3D with MPI 8

Page 9: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation-Program Flow

27/01/2011 MuCoSim WS10: LBM in 3D with MPI 9

Page 10: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation-Program Flow

27/01/2011 MuCoSim WS10: LBM in 3D with MPI 10

Page 11: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation for LBMDomain Decomposition

27/01/2011 11MuCoSim WS10: LBM in 3D with MPI

Page 12: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation for LBM

Boundary Condition at Walls

Boundary

Condition

Stream step in

following

iteration

Boundary Condition at Ghost layers - communication required

27/01/2011 12MuCoSim WS10: LBM in 3D with MPI

Communication after each stream-collide step

Page 13: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation for LBMHow communication is done

• In D3Q19 scheme, 5 distribution functions for each cell needs communication in each direction.

• pack () function:

– To fill buffer with necessary distribution functions

– Takes care of direction in which to communicate

• Send-Recv

27/01/2011 13MuCoSim WS10: LBM in 3D with MPI

• unpack() function:

– To extract distribution functions from received buffer

– Takes care of direction from which buffer is received

Page 14: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

MPI Implementation for LBM

Example : Domain of 200 x 200 x 200

Naïve implementation :• Naïve implementation :

Communicate all distributions

19*8*2 byte per Cell � 13 MB per iteration (comm face)

• Communicate only necessary distributions:

5*8*2 byte per Cell � 3.2 MB (comm face)

27/01/2011 14MuCoSim WS10: LBM in 3D with MPI

Page 15: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance measurement - Woody

PC

PC

C

PC

PC

C

Chipset

Memory

C CC

C CC

27/01/2011 15MuCoSim WS10: LBM in 3D with MPI

likwid-mpirun –np 1 –pin N:0 ./lbm params.dat

likwid-mpirun –np 2 –pin N:0_1 ./lbm params.dat

likwid-mpirun –np 4 –NperDomain S:2 ./lbm params.dat

likwid-pin -t intel -c S0:0-1@S1:0-1 ./lbm params.dat

Page 16: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance measurement - Tinyblue

PC

PC

PC

PC

PC

PC

PC

PCC

CCC

CC

MI

Memory

CC

C

CC

CC

CC

MI

Memory

CC

C

27/01/2011 16MuCoSim WS10: LBM in 3D with MPI

likwid-mpirun –np 1 –pin N:0 ./lbm params.dat

likwid-mpirun –np 2 –pin N:0_1 ./lbm params.dat

likwid-mpirun –np 4 –NperDomain S:4 ./lbm params.dat

likwid-mpirun –np 8 –NperDomain S:4 ./lbm params.dat

likwid-pin -t intel -c S0:0-3@S1:0-3 ./lbm params.dat

Page 17: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance measurement - Lima

27/01/2011 17MuCoSim WS10: LBM in 3D with MPI

likwid-mpirun –np 1 –pin N:0 ./lbm params.dat

likwid-mpirun –np 2 –pin N:0_1 ./lbm params.dat

likwid-mpirun –np 4 –pin N:0_1_2_3 ./lbm params.dat

likwid-mpirun –np 6 –NperDomain S:6 ./lbm params.dat

likwid-mpirun –np 12 –NperDomain S:4 ./lbm params.dat

likwid-pin -t intel -c S0:0-5@S1:0-5 ./lbm params.dat

Page 18: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance measurement – Scaling Results on TinyBlue

27/01/2011 18MuCoSim WS10: LBM in 3D with MPI

likwid-mpirun –np 8 –NperDomain S:4 ./lbm params.dat

likwid-mpirun –np 16 –NperDomain S:4 ./lbm params.dat

likwid-mpirun –np 32 –NperDomain S:4 ./lbm params.dat

likwid-mpirun –np 64 –NperDomain S:4 ./lbm params.dat

Page 19: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Performance measurement – Scaling Results on Lima

27/01/2011 19MuCoSim WS10: LBM in 3D with MPI

likwid-mpirun –np 12 –NperDomain S:4 ./lbm params.dat likwid-mpirun –np 24 –NperDomain S:4 ./lbm params.dat

likwid-mpirun –np 48 –NperDomain S:4 ./lbm params.dat likwid-mpirun –np 96 –NperDomain S:4 ./lbm params.dat

likwid-mpirun –np 192 –NperDomain S:4 ./lbm params.dat likwid-mpirun –np 384 –NperDomain S:4 ./lbm params.dat

Page 20: Lattice Boltzmann algorithm in 3D with MPI · Lattice Boltzmann theory • Domain Discretization: Time and space (D3Q19 Lattice) – Fluid particles are positioned in certain lattice

Thank you!

Questions and comments are welcome

• Special thanks:

27/01/2011 MuCoSim WS10: LBM in 3D with MPI 20

• Special thanks:

– Mr. Johannes Habich