85
Enabling High Performance Computational Dynamics in a Heterogeneous Hardware Ecosystem Assistant Prof. Dan Negrut Simulation-Based Engineering Lab Department of Mechanical Engineering University of Wisconsin – Madison Toward High-Performance Computing Support for the Simulation and Planning of Robot Contact Tasks June 27, 2011

Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Enabling High Performance Computational Dynamics

in a Heterogeneous Hardware Ecosystem

Assistant Prof. Dan Negrut

Simulation-Based Engineering Lab

Department of Mechanical Engineering

University of Wisconsin – Madison

Toward High-Performance Computing Support

for the Simulation and Planning of Robot Contact Tasks

June 27, 2011

Page 2: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Acknowledgements

� People:

� Alessandro Tasora – University of Parma, Italy

� Mihai Anitescu – Argonne National Lab

� Lab Students:

� Hammad Mazhar

� Toby Heyn

� Andrew Seidl

� Justin Madsen

� Naresh Khude

� Makarand Datar

� Financial support� National Science Foundation, Young Investigator Career Award

� FunctionBay, S. Korea

� NVIDIA

� Microsoft

� Caterpillar

2

� Andrew Seidl

� Arman Pazouki

� Hamid Ansari

� Makarand Datar

� Dan Melanz

� Markus Schmid

Page 3: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Talk Overview

� Overview of the engineering problems of interest

� Large-scale Multibody Dynamics� Problem formulation, solution method, and parallel implementation

� Overview of Heterogeneous Computing Template (HCT)

� Numerical Experiments

� Validation efforts

� Conclusions3

Page 4: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Computational Multibody Dynamics

Simulation generated in ADAMS

4

Page 5: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Multi-Physics…Fluid-Solid Interaction: Navier-Stokes + Newton-Euler.

5

Page 6: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Computational Dynamics

6

Page 7: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Rover Mobility on Granular Terrain

� Wheeled/tracked vehicle mobility on granular terrain

� Also interested in scooping and loading granular material

7

Simulation in Chrono::Engine

Page 8: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Frictional Contact Simulation[Commercial Solution]

� Model Parameters:� Spheres: 60 mm diameter and mass 0.882 kg� Forces: smoothing with stiffness of 1E5, force

exponent of 2.2, damping coefficient of 10.0, and a penetration depth of 0.1

� Simulation length: 3 seconds

8

Page 9: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Frictional Contact: Two Different Approaches

Considered

� Discrete Element Method (DEM) - draws on a “smoothing” (penalty) approach

� Lots of heuristics

� Slow

� General purpose

� Used in ADAMS� Used in ADAMS

� DVI-based (Differential Variational Inequalities)� A set of differential equations combined with inequality constraints

� Fast (stable for significantly larger integration step-sizes)

� Less general purpose

� Used widely in computer games

9

Page 10: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

The Modeling Component

10

Page 11: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Equations of Motion: Multibody Dynamics

11

Page 12: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Traditional Discretization Scheme

positionstime step index

speeds Reaction

impulsesApplied ForcesMass Mat.

Coulomb 3D fricion

model

Complementarity

Condition

Stabilization

term

(Stewart & Trinkle, 1996)12

Page 13: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Relaxed Discretization Scheme Used

Relaxation Term

(Anitescu & Tasora, 2008)13

Page 14: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

� Introduce the convex hypercone...

The Cone Complementarity Problem

(CCP)

... and its polar hypercone:

� First order optimality conditions lead to Cone Complementarity Problem

CCP assumes following form: Find γ such that

14

Page 15: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Putting Things in Perspective…

15

� Friction model posed as an optimization problem

� Working with velocity and impulses rather than acceleration and forces

� Contact complementarity expression altered to lead to CCP

� Three key points led to above algorithm:

Page 16: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Implementation

� Method outlined implemented using two loops

� Outer loop – runs the time stepping

� Inner loop – CCP Algorithm (solves CCP problem at each time step)

16

Page 17: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Granular Dynamics:How Parallel Computing is Leveraged

1. Parallel Collision Detection

2. (Body parallel) Force kernel

3. (Contact parallel) Contact preprocessing kernel

4. (Contact parallel) CCP contact kernel

Ou

ter L

oo

p

4. (Contact parallel) CCP contact kernel

5. (Constraint parallel) CCP constraint kernel

6. (Reduction-slot parallel) Velocity reduction kernel

7. (Body parallel) Body velocity update kernel

8. (Body parallel) Time integration kernel

Inn

er

Lo

op

17

Ou

ter L

oo

p

Page 18: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Inner Loop (CCP Algorithm)

18

Page 19: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Large Scale Granular Dynamics

� Numerical solution can leverage parallel computing

19

Page 20: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

800

1000

1200

Tesla 10-series

Tesla 20-series

CPU vs. GPU – Flop Rate(GFlop/Sec)

Single Precision

Double Precision

0

200

400

600

800

2003 2004 2005 2006 2007 2008 2009 2010

Tesla 8-series

Nehalem3 GHz

Westmere3 GHz

Tesla 20-series

Tesla 10-series

20

Page 21: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

CPU vs. GPU– Memory Bandwidth[GB/sec]

100

120

140

160

Tesla 10-series

Tesla 20-series

21

0

20

40

60

80

100

2003 2004 2005 2006 2007 2008 2009 2010

Tesla 8-series

Nehalem 3 GHz

Westmere3 GHz

Page 22: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Mixing 40,000 Spheres on the GPU

22

Page 23: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

300K Spheres in Tank[parallel on the GPU]

23

Page 24: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

1.1 Million Rigid Spheres[parallel on the GPU]

24

Page 25: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

A Heterogeneous Computing Templatefor for

Computational Dynamics

25

Page 26: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Heterogeneous Cluster

26

Second fastest cluster at University of Wisconsin-Madison

Page 27: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Computation Using Multiple CPUs[DEM solution]

27

Page 28: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Computation Using Multiple CPUs[DEM solution]

28

Page 29: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Computation Using Multiple CPUs[DEM solution]

29

Page 30: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

True Heterogeneous Computing

� Use 1 GPU per MPI process

� Each process uses its GPU to perform the collision detection � Each process uses its GPU to perform the collision detection

task during each time step

� As before, CPU takes care of communication between sub-domains

� State data is copied to GPU, CD is performed, and collision data is copied back

� GPU is used as an accelerator/co-processor

30

Page 31: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Demonstration

16,000 bodies

2 sub-domains

CPU:

CPU+GPU: 9.43 hrs

31

Page 32: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Heterogeneous Computing TemplateFive Major Components

� Computational Dynamics requires

� Domain decomposition

� Proximity computation

� Inter-domain data exchange� Inter-domain data exchange

� Numerical algorithm support

� Post-processing (visualization)

� HCT represents the library support and associated API that capture this five component abstraction

32

Page 33: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Typical Simulation Results…

� LEFT: Infinity norm of the residual vs. iteration index in the CCP solution� Convergence rate (slope of curve) becomes smaller as the iteration index increases.

33

� RIGHT: Infinity norm of the CCP residual after rmax iterations as function of

granular material depth (number of spheres stacked on each other).

Page 34: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Searching for Better Methods

� Frictionless case (bound constraints in place)

� Gauss-Jacobi (CE)

� Projected conjugate gradient (ProjCG)

� Gradient projected conjugate gradient (GPCG)� Gradient projected conjugate gradient (GPCG)

� Gradient projected MINRES (GPMINRES)

� Friction case (cone constraints - ongoing)

� Newton’s Method for large bound-constrained problems� Uses re-parameterization to handle friction cones (replace with bound constraints)

34

Page 35: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Numerical Experiments

� Test Problem: 40,000 bodies ⇒ 157,520 contacts

� Frictionless

35

Page 36: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Test Problem (MATLAB)

Method Iterations

Final Residual

Norm

γmin γmax Time [sec]

CE 1000 6.11 x 10-2 0.0 2.0598 1849.5

ProjCG 1002 5.6344 x 10-4 0.0 2.2286 1235.6

GPCG 1600 1.0675 x 10-4 0.0 2.6349 382.3644

GPMinres 1100 9.5239 x 10-5 0.0 2.3090 238.0744

PCG 1000 2.4053 x 10-4 -1.1116 2.5254 27.9686

GMRES 1000 4.5315 x 10-5 -1.1635 2.5227 736.3007

MINRES 1000 1.6979 x 10-5 -1.1316 2.5253 41.5790

36

Page 37: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Proximity ComputationProximity Computation

37

Page 38: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

GPU Collision Detection (CD)

� 30,000 feet perspective:

� Carry out spatial partitioning of the volume occupied by the bodies� Carry out spatial partitioning of the volume occupied by the bodies� Place bodies in bins (cubes, for instance)

� Follow up by brute force search for all bodies touching each bin� Embarrassingly parallel

38

Page 39: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

� Example: 2D collision detection, bins are squares

Basic Idea: Search for Contacts in Different Bins

in Parallel

39

Page 40: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Ellipsoid-Ellipsoid CD: Results

30

35

40

45

Tim

e (

seco

nd

s)

Time vs. Number of Contacts

120

140

160

180

Speedup - GPU vs. CPU

40

0

5

10

15

20

25

30

0 500,000 1,000,000 1,500,000

Tim

e (

seco

nd

s)

Number of Contacts

0

20

40

60

80

100

120

0 500,000 1,000,000 1,500,000

Sp

eed

up

Number of Contacts

Page 41: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Speedup - GPU vs. CPU (Bullet library)[results reported are for spheres]

140

160

180

200

GPU: NVIDIA Tesla C1060CPU: AMD Phenom II Black X4 940 (3.0 GHz)

41

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

X S

peed

up

Contacts (Millions)

Page 42: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

2.5

3

3.5

4

Tim

e (

sec)

Parallel Implementation:

Number of Contacts vs. Detection Time

[results reported are for spheres]

0

0.5

1

1.5

2

0 2 4 6 8 10 12 14 16 18 20 22 24

Tim

e (

sec)

Contacts (Millions)42

Page 43: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Multiple-GPU Collision Detection

Assembled Quad GPU Machine

Processor: AMD Phenom II X4 940 Black

Memory: 16GB DDR2

Graphics: 4x NVIDIA Tesla C1060

Power supply 1: 1000W

Power supply 2: 750W

43

Page 44: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

SW/HW Setup

Main Data Set

Results16 GB RAM

Thread

0

Thread

1

Thread

3

Thread

2

GPU

0

GPU

1

GPU

3

GPU

2

Open MP Quad Core AMD

Microprocessor

Tesla C1060

4x4 GB Memory

4x30720 threadsCUDA

44

Page 45: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Results – Contacts vs. Time

140

160

180

200

Tim

e (

Sec)

Quad Tesla C1060 Configuration

45

0

20

40

60

80

100

120

0 1 2 3 4 5 6

Tim

e (

Sec)

Contacts (Billions)

Page 46: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Conclusions

� Work aimed at enabling high-fidelity discrete models using a physics-based approach

� Approach draws on unique CPU + GPU parallel approach, which � Approach draws on unique CPU + GPU parallel approach, which leverages a Heterogeneous Computing Template

� Accomplishments to date� Billion body parallel collision detection

� Parallel solution of cone complementarity problem, about 12 million unknowns

� Early validation results encouraging

� Aiming at billion bodies simulations

46

Page 47: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Ongoing/Future Work

� Massively parallel linear algebra for solution of CCP problem

More general collision detection code� More general collision detection code

� Multiphysics:

� Fluid-solid interaction

� Electrostatics

47

Page 48: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Thank You.Thank You.

48

Page 49: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

49

Page 50: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

50

Page 51: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Validation.Validation.

51

Simulation is doomed to succeed.(Rod Brooks, roboticist)

Page 52: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Model

Simulate

Validate

� Validation at “microscale” – University of Wisconsin-Madison� Work in progress

� Validation at “macroscale” – University of Parma, Italy

52

Page 53: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flat Hopper Tests

Video recording from a test (a case that starts from high crystallization)

53

Page 54: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flat Hopper Tests

3D rendering from a simulation (4x slower than real-time) 54

Page 55: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flat Hopper Tests

� Comparison experimental - simulated

Experimental Simulated

55

Page 56: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Validation atMicroscale

� Sand flow rate measurements

� Approx. 40K bodies

� Glass beads

� Diameter: 100-500 microns

56

Page 57: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Experimental Setup

Disruptor beads

CPU connection

Nanopositioner controller

Translational stage

Load cell

Nanopositioner

57

Page 58: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flow Measurement, 500 micron Spheres

58

Page 59: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flow Simulation, 500 micron Spheres

59

Page 60: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flow Measurement Results, 3mm Gap Size

Weig

ht

[N]

60

Weig

ht

[N]

Time [sec]

Page 61: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flow Measurement Results, 2.5mm Gap Size

Weig

ht

[N]

61

Weig

ht

[N]

Time [sec]

Page 62: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flow Measurement Results, 2mm Gap Size

Weig

ht

[N]

62

Weig

ht

[N]

Time [sec]

Page 63: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Flow Measurement Results, 1.5mm Gap Size

63

Weig

ht

[N]

Time [sec]

Page 64: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Validation Experiment:Repose Angle

� Simulation� Experiment

64

φ = 19.5◦ for µ = 0.39

Page 65: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Validation ExperimentFlow and Stagnation

65

Page 66: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Validation,Flow and Stagnation

66

Page 67: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Validation,Flow and Stagnation

67

Page 68: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

68

Page 69: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

69

Page 70: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Spherical Decomposition

� Represent complex geometry as a union of spheres

� Fast parallel collision detection on GPU

� Allows non-convex geometry

� NOTE: Used only for CD, the dynamics is performed using original geometry

70

Page 71: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Examples…

� Chain model

� 10 links

� 7,797 spheres per link

� Plow model

� 31,791 spheres in plow

blade model

� 15,000 spheres

representing terrain

71

71

Page 72: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Numerical Results: Pebble Bed Nuclear Reactor

� Two types of tests were run

� On the GPU

� CD: NVIDIA 8800 GT

� CCP: NVIDIA Tesla C870

� On the CPU

� Single threaded

� Quad Core Intel Xeon E5430

� 2.66 GHz

� The reactor contains spheres which flow out the bottom the nozzle and are recycled to the top of the reactor

� Performed simulations with 16K, 32K, 64K, and 128K bodies72

Page 73: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

73

Page 74: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Results: Average Duration for Taking One Integration Step [∆∆∆∆t=0.01 s]

y = 0.0007x - 6.3447R² = 0.996

70

80

90

100

Pebble Bed Reactor Demo, GPU vs. CPU

GPU Average Total Time CPU average Total Time Linear (GPU Average Total Time) Linear (CPU average Total Time)

y = 6E-05x - 0.51R² = 0.9936

0

10

20

30

40

50

60

70

0 16000 32000 48000 64000 80000 96000 112000 128000

Av

era

ge T

ota

l T

ime

# of spheres in pebble bed reactor74

Page 75: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Algorithmic Challenges: Gauss-Jacobi Doesn’t Scale Well

� Convergence stalls for certain classes of problems

� Very large problems

� Problems where I have many body stacked on each other

� Inherent problem with Gauss-Jacobi, pertains the propagation of � Inherent problem with Gauss-Jacobi, pertains the propagation of

information

� Granular dynamics problem have an intrinsic degree of

redundancy

75

Page 76: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Ellipsoid-Ellipsoid CD: Visualization

76

Page 77: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Example: Ellipsoid-Ellipsoid CD

1n

1P 2

ε

1 2 1 2 1 2

1 2

1 1( ) ( )2 2λ λ

= = + +d P - P M M c b - b

2 22

1 2 1 2,i i i i j i j i j

α α α α α α α α α

∂ ∂ ∂ ∂∂ ∂= − = −

∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂

P P P Pd d

Rotation MatrixA :

77

c

2n

2P

y

x

z

3

1 1( )2 8

T

i iα λ αλ

∂ ∂= −

∂ ∂

P cM Mcc M

2 TM = AR A

1 2 3( , , )diag r r r=R

Translation of

ellipsoids center

b :

2 1

4

Tλ = n Mn

2

3 5

3

2

3

1 3( )

8 32

1[ ) ) ]

8

1 1( )2 8

T T

i j j i

T T

i i j

T

i j

α α α αλ λ

α α αλ

λ α αλ

∂ ∂ ∂= − +

∂ ∂ ∂ ∂

∂ ∂ ∂−

∂ ∂ ∂

∂+ −

∂ ∂

P c cM Mcc M c M

c c c(c M M + Mc( M

cM Mcc M

Page 78: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Going Back to Practical Problem of Interest

� Tracked Vehicle on Granular Terrain…

78

Page 79: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Spherical Decomposition Steps

2: Cubit

3: Spherical Padding

1: Take shoe element

3: Spherical Padding

79

Page 80: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Track Components1,594,908 spheres per track

80

Page 81: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Granular Terrain Model

� Represent terrain as collection of discrete particles

� Match terrain surface profile

� Capture changing granularity with depth

81

Page 82: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Track Simulation 1

Parameters:• Driving speed: 1.0 rad/sec

• Length: 12 seconds

• Time step: 0.005 sec

• Computation time: 18.5 hours

• Particle radius: .027273 m

• Terrain: 284,715 particles

82

Page 83: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Track Simulation 2

Parameters:• Driving speed: 1.0 rad/sec

• Length: 10 seconds

• Time step: 0.005 sec

• Computation time: 17.8 hours

• Particle radius: .025±.0025 m

83

• Terrain: 467,100 particles

Page 84: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Results: Track ‘Footprint’

84

Page 85: Enabling High Performance Computational Dynamics in a …trink/RSS-2011/Slides_and_Posters/negrut.pdf · 2011-06-25 · Enabling High Performance Computational Dynamics in a Heterogeneous

Results: PositionsTrack Simulation 1

85