35
Speculative Atomics Case-study of the GPU Optimization of the Material Point Method for Graphics Gergely Klar UCLA Computer Graphics & Vision Laboratory

Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Speculative AtomicsCase-study of the GPU Optimization of the Material Point Method for Graphics

Gergely KlarUCLA Computer Graphics & Vision Laboratory

Page 2: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

MotivationA. Stomakhin, C. Schroeder, L. Chai, J. Teran, A. Selle, A Material Point Method for Snow Simulation,ACM Transactions on Graphics (SIGGRAPH 2013), 32(4), pp. 102:1-102:10, 2013.

Images ©Disney.

Page 3: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

MPM Simulations

● Amazing new effects● Extremely computationally demanding

○ millions of particles○ 200 million cell grids○ up to 30 minutes per frame

Page 4: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

MPM Simulations

● Amazing new effects● Extremely computationally demanding

○ millions of particles○ 200 million cell grids○ up to 30 minutes per frame

GPU to the rescue!

Page 5: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

MPM Overview

1. Particles-to-grid2. Grid operations3. Particles-from-grid4. Particle operation

Page 6: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Parallelizing steps

1. Particles-to-grid2. Grid operations3. Particles-from-grid4. Particle operations

Page 7: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Parallelizing steps

1. Particles-to-grid 2. Grid operations3. Particles-from-grid

✓ 4. Particle operations --- trivial

Page 8: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Parallelizing steps

1. Particles-to-grid 2. Grid operations

✓ 3. Particles-from-grid --- trivial✓ 4. Particle operations --- trivial

Page 9: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Parallelizing steps

1. Particles-to-grid ✓ 2. Grid operations --- case by case✓ 3. Particles-from-grid --- trivial✓ 4. Particle operations --- trivial

Page 10: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Parallelizing steps

✘ 1. Particles-to-grid --- challenge✓ 2. Grid operations --- case by case✓ 3. Particles-from-grid --- trivial✓ 4. Particle operations --- trivial

Page 11: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

What’s wrong with particles-to-grid?

It is a classical scatter operation● up to 10 particles per cell initially● 53 support (region of influence) per particle⇒ Many race conditions, if processed particle-by-particle

Page 12: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

How to do super fast scatter?

1. Use atomics operations!2. Protect coalesced access!3. Disperse the particles!

Page 13: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Why not gather?

Classical work around:Turn scatter it into gather!

Problems:● No constraint on particle count per cell● Overhead● Need to do it in every step!

Page 14: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Why not gather?

Classical work around:Turn scatter it into gather!

Problems:● No constraint on particle count per cell● Overhead● Need to do it in every step!

Page 15: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Let’s use atomics!

(and I mean atomic instructions)

Pro: Don’t need to worry about race conditionsCon: Can get very slow

Page 16: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Speculative atomics

Guaranteeing no race conditions is expensive.Atomic instructions are getting better, minimal overhead when there’s no need to block

Idea: Don’t try to eliminate all race conditions, but reduce their likelihood and use atomic

instructions

Page 17: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Bad case

Page 18: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Bad case

Page 19: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Worst case

Page 20: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Best case

Page 21: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Best case

Page 22: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

How to get the best case all the time?

Concurrent threads need to process particles that● are from different cells● are consecutive in memory● access consecutive grid cells

Page 23: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Bad idea: Indirect indexingprocessParticle(p[particleIndexShuffler(threadIdx)])

Problem: particles processed in parallel are spread out in memory → can not use coalesced memory access → poor performanceSolution: need to physically shuffle particles

Page 24: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Blocked layoutParticles in memory

Particles in space

Page 25: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Blocked layout in action

Page 26: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Blocked layout

1. Assign cell ID to each particle by the grid cell it is in. cell IDs define groups.

2. Sort particle IDs by cell IDs3. Rearrange particles by picking the 1st from

each group, then the 2nd, etc.

Page 27: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Blocked layout Pseudocode 1/2

kernelComputeCellID // per particle

cellX = floor(x[pID]/dx)

cellY = floor(y[pID]/dx)

cellZ = floor(z[pID]/dx)

cellID[pID] = cellX+

cellY*GRID_WIDTH+

cellZ*GRID_WIDTH*GRID_HEIGHT

Page 28: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Blocked layout Pseudocode 2/2

sequence(pIDs)

sort_by_key(cellIDs, pIDs)

exclusive_scan_by_key(cellIDs, constant(1), newIDs)

sort_by_key(newIDs,

permutation(zip(<all particle data>),

pIDs))

Page 29: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Re-blocking

No need to rearrange the particles in every step, but● Goal of MPM is to simulate deformations

and fracturing● Particles can move around a lot● Monitor scatter performance, re-block if

drops below threshold

Page 30: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Results

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Relative speeds compared to the CPU implementation

CPU MPM: 100%

GPU Naïve Atomics: 9.8%

GPU Gather: 4.9%

GPU Blocked Layout: 2.2%

Page 31: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Results

CPU GPU Naïve GPU Gather GPU B. Layout

CPU 1x 10.20x 20.41x 45.45x

GPU Naïve 1x 2.00x 4.45x

GPU Gather 1x 2.23x

GPU B. Layout 1x

Relative speed-ups compared to other methods

Page 32: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Summary● Speculative Atomics:

○ The overhead of unnecessary atomic instructions is lower than the overhead of complicated gather transforms

● Blocked layout:○ Special arrangement of particles to minimize the number

of race conditions● Not just for MPM:

○ Can be adopted to other particles-and-grids simulations as well

Page 33: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Takeaway

1. Don’t be afraid to use atomic operations!2. Use coalesced access at all costs!3. Spread out the particles (but not too far)!

Page 34: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Questions?

Page 35: Speculative Atomics: A Case-Study of the GPU Optimization ... · Title: Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics Author:

Thank you!