54
Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings Universitet SE-601 74 Norrköping, Sweden 601 74 Norrköping LiU-ITN-TEK-A--10/065--SE GPU Accelerated Surface Reconstruction from Particles Erik Edespong 2010-10-28

GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings Universitet SE-601 74 Norrköping, Sweden 601 74 Norrköping

LiU-ITN-TEK-A--10/065--SE

GPU Accelerated SurfaceReconstruction from Particles

Erik Edespong

2010-10-28

Page 2: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--10/065--SE

GPU Accelerated SurfaceReconstruction from Particles

Examensarbete utfört i medieteknikvid Tekniska Högskolan vid

Linköpings universitet

Erik Edespong

Handledare Magnus WrenningeExaminator Jonas Unger

Norrköping 2010-10-28

Page 3: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Erik Edespong

Page 4: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

GPU Accelerated Surface

Reconstruction from Particles

Erik Edespong

[email protected]

October 31, 2010

Page 5: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Abstract

Realistic fluid effects, such as smoke and water has been pursued by the vi-

sual effects industry for a long time. In recent years, particle simulations have

gained a lot of popularity for achieving such effects. One problem noted by re-

searchers has been the difficulty of generating surfaces from the particles. This

thesis investigates current techniques for particle surface reconstruction. In ad-

dition to this, a GPU-based implementation using constrained mesh smoothing

is described. The result is globally smooth surfaces which closely follows the

distribution of the particles, though some problems are still apparent. The per-

formance of the algortihm is approximately an order of magnitude faster than

its CPU counterpart, but is clogged by bottlenecks in sections still runnning on

the CPU.

Page 6: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Contents

1 Introduction 3

1.1 Method summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Structure of the report . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background survey 6

2.1 Fluid simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Particle surfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Metaballs . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Surface splatting . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.3 Level set based methods . . . . . . . . . . . . . . . . . . . 10

2.2.4 Beyond metaballs . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.5 Anisotropic kernels . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Surface extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Marching Cubes . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.2 Marching Tetrahedra . . . . . . . . . . . . . . . . . . . . . 15

2.3.3 Marching Tiles . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Surface smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4.1 Unweighted graph Laplacian . . . . . . . . . . . . . . . . 18

2.4.2 Weighted surface bilaplacian . . . . . . . . . . . . . . . . 19

2.4.3 Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1

Page 7: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

3 GPU programming and CUDA 21

3.1 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Memory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Programming interface . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.1 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Implementation 28

4.1 Uniform hashed grid . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Distance field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Mesh creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4.1 Finding matrices . . . . . . . . . . . . . . . . . . . . . . . 32

4.4.2 Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.6 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.7 Attribute transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.8 Houdini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Results 35

5.1 Computer specifications . . . . . . . . . . . . . . . . . . . . . . . 35

5.2 Example: Fountain . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.3 Example: Splash . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.4 Example: Color and velocity transfer . . . . . . . . . . . . . . . . 38

5.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Conclusion 41

6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Bibliography 44

2

Page 8: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Chapter 1

Introduction

In the realm of visual effects, the creation of natural phenomena such as water,

smoke and fire have been pursued since the early 80’s with the release of the

movie Star Trek II: The Wrath of Khan [Par82][Ree83]. Since such phenomena

are very complex, manual techniques such as key-framing is not a viable option.

Instead these types of effects are done by setting up physical models for the

movement of the fluids and simulating the outcome. Because people are used to

seeing, for example, water in real life, it is very important that the water acts

and looks as expected if it is to be perceived as real, e.g. water is expected to

have a certain viscosity and if it runs too slowly it will be easily spotted.

Fluid simulation approaches are usually divided into two main categories:

grid-based (Eulerian) approaches and particle-based (Lagrangian) approaches.

Both have their own strengths and weaknesses which will be discussed in Chap-

ter 2.

As one of the world’s top visual effects studios, Sony Pictures Imageworks has

been working with creating natural phenomena for a long time. The company

has proprietary software for both of the simulation approaches mentioned above,

the latest addition being a Smoothed Particle Hydrodynamic (SPH) library

by Lundqvist in 2009 [Lun09]. This library was integrated with Side Effects’

Houdini [Sof10] for use by the artists in the production pipeline.

One thing missing from Lundqvist’s library is the possibilty to convert the

3

Page 9: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

particle representation of the fluid to an actual renderable surface. For this

task, a tool that shipped with Houdini is used. The built-in tool does however

have a number of shortcomings which makes it unfavorable to use. First of

all it is slow when used with large numbers of particles over large simulation

domains; it often take more time to reconstruct a surface from the particles

than it does to simulate their movement. Secondly, it is difficult to create

smooth surfaces without significantly reducing the small-scale detail as well as

introducing visually disturbing artifacts to the surface. Finally, the tool is quite

hard to use for artists without a technical background, with many non-intuitive

settings that need to be tuned to get a resonable surface.

The objective of this thesis is to investigate the creation of surfaces from

particle data and the capturing of details of small-scale fluid simulations such

as objects falling into water and creating splashes. The goal is to create a

prototype for this which is going to be integrated into Houdini. Moreover,

the prototype should utilize the parallel capabilities of today’s GPUs for speed

benefits. In addition to this, a CPU version should also be created for reference

and for use on computers without sufficient GPU support.

1.1 Method summary

The method used in the implementation of this thesis is based on the work of

Williams [Wil09]. The method creates an initial surface from a field where each

voxel holds the distance to the nearest particle. The surface is created by using a

new approach based on Marching Tetrahedra called Marching Tiles. The initial

surface is quite crude, so a number of iterative smoothing steps are applied in

order to make the surface look good. Normally the problem with smoothing are

collapsing surfaces, but the approach applies constraints on the surface after

each smoothing step in order to make sure that the surface still represents the

underlying particle distribution.

4

Page 10: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

1.2 Structure of the report

Chapter 2 introduces the necessary background for Eulerian and Lagrangian

methods and surveys past as well as current techniques for particle surface

reconstruction. In Chapter 3, GPU computing and the CUDA computing ar-

chitecture are presented together with considerations for performance benefits.

Implementation details for the necessary steps of particle surface reconstruction

using the GPU are presented in Chapter 4. Chapter 5 presents the results of

the thesis with both images as well as timing comparisons between the CPU

and GPU versions. Finally, Chapter 6 features a conclusion of the results and

also contains suggestions of further improvements which can be done.

5

Page 11: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Chapter 2

Background survey

This chapter begins with a description of the two main fluid simulation ap-

proaches. Following this, different methods for creating surfaces from particle

data are presented, with a focus on methods that samples the particles on an

underlying field. Then, different ways of extracting surfaces from such fields

are described. Finally, the constrained surface smoothing method used in the

implementation of this thesis is described.

2.1 Fluid simulation

The goal when simulating natural phenomena is to solve equations which de-

scribe the motion of the fluid. A popular set of equations used are the Navier-

Stokes equations formulated by Claude-Loius Navier in 1822 and by George

Gabriel Stokes 1845. These equations are used in a wide number of areas and

was first used in computer graphics by Kajiya et al. in 1984 [KVH84]. The

difference between computer graphics and many other areas, such as scientific

simulation, is that the result does not need to be accurate, or even remotely

correct. As long as the result looks realistic to the human eye, it is acceptable

and thus a lot of approximations and simplifications can be made. As noted in

Chapter 1, there exists two main approaches to fluid simulation which are used

for visual effects: Eulerian and Lagrangian, both of which are described below.

6

Page 12: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Eulerian approaches are characterized by a simulation domain that is dis-

cretized into a static three dimensional field of voxels. The voxels in the field

each hold physical properties such as velocity, density and temperature and the

values at each voxel are calculated by solving the governing equations in every

time step. The field offers fast access to the data for any position within the field

by means of interpolation from nearby voxels that can be accessed instantly.

Although grid-based techniques have been the most common way of simu-

lating fluids [Bri08], they have some inherent problems. The finite and fixed

nature of the field used for computation can make the representation of fine

details difficult. This can of course be solved by increasing the size of the field,

however at great cost to computation and memory consumption. There has also

been use of data structures such as octrees by Losasso [LGF04] et al. for this

problem as well as hybrid methods such as Particle Level Sets. The Particle

Level Set method, introduced by Enright et al. [EFFM02], spawns particles at

strategic places in the field and then let these be moved at each time step. The

values of the field are then corrected by the particles if they differ too much.

Still, even with more advanced methods, the representation of very fine detail

is difficult due to the fixed nature of the grid.

The surface, or interface, of a fluid is implicit in Eulerian approaches. It is

often represented by a level set φ where each point in the field holds the signed

distance to the interface. Values of φ < 0 are considered to be inside the fluid

while values of φ > 0 are outside. Implicitly, this gives that the surface of the

fluid lies at φ = 0 [OF03].

Retrieving the surface for rendering is trivial once the level set has been

retrieved by using a Marching algorithm such as Marching Cubes (see section

2.3) for extraction of an iso-surface.

Whereas Eulerian approaches simulate the fluid at fixed point in space, La-

grangian techniques use a cloud of particles to move with the fluid and to rep-

resent it. One of the big advantages with this is that the simulation domain is

unrestricted and not bound by any grid, allowing each particle to move freely.

7

Page 13: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

While grid-based methods need to calculate values at each point in the field,

even if there is no fluid in the voxel, particle-based approaches only need to do

calculations where there are particles, since this is where the fluid is located.

Another advantage Lagrangian methods have is that each particle has a

mass, making problems with mass and volume loss non-existent as long as the

number of particles is kept constant.

One of the disadvantages with Lagrangian methods noted by researchers is

the difficutly of extracting a high-quality renderable surface [EFFM02] [PTB+03].

With the increasing interest of Lagrangian methods, especially Smoothed Parti-

cle Hydrodynamics, the aspect of surface reconstruction from particles has also

gained some attention and is the main focus of this thesis.

It should be noted that the surfacing techniques below are not restricted

to any particular simulation technique. Any algorithm that simulates particles

and has the properties needed by the surfacing algorithms can be used. SPH is

used in the examples and results because it is the main system targeted during

the work of this thesis.

2.2 Particle surfacing

2.2.1 Metaballs

The field of reconstructing surfaces from particles got one of its first contribution

from Blinn [Bli82] when he introduced blobbies, or metaballs, in 1982. The

approach creates an influence field from the particles. Each point in the field

evaluates the positions of the particles with a function and sums the results to

get a final influence value. By doing this it is possible to get a smooth blending

between particles, see Figure 2.1. A general description of how metaballs are

constructed can be seen in Equation 2.1, where xi is the position of the particle,

x is the voxel position in the field and W is a function that describes the

influence, also called a smoothing kernel.

The Gaussian function seen in Equation 2.2, with a and b being constants,

is the influence function that Blinn used, but any function that is smooth and

8

Page 14: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

goes to zero at a maximum value can be used.

φ(x) =∑i

W (x− xi) (2.1)

W (r) = ae−b‖r‖ (2.2)

In the case of functions like that of Equation 2.2, it can be seen that the

influence will drop off very rapidly for increasing distances between x and xi,

but never reaches zero. Because accuracy is not of critical importance, it is wise

to specify a radius h, outside of which the particle influence is not considered.

Equation 2.3 shows another smoothing kernel used by Rosenberg et al. [RB08].

This kernel will be computationally faster than Equation 2.2 partly because

of the lack of square roots and the exponential, but mainly because particles

outside of h do not need to be processed.

W (r, h) =

(‖r‖/√

2h)4 − (‖r‖/√

2h)2 + 0.25 if ‖r‖ < h

0 otherwise(2.3)

Once the field is computed, it is very easy to extract the isosurface using

one of the methods described in Section 2.3.

Though they are easy to implement, metaballs have a number of disadvan-

tages as well. The surface extracted from them is very dependent on the iso-level

chosen for extraction as well as the distance h used in the summation [APKG07].

Figure 2.1: Surfacing using metaballs. When two particles are brought together,

the combined influence will create a smooth blending (middle), but the technique

is bad at representing flat surfaces (right).

9

Page 15: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Metaballs are also bad at representing flat surfaces, see rightmost image of Fig-

ure 2.1 where the underlying spherical nature of the particle representation is

clearly exposed. Adams et al. [APKG07] also note that disturbing temporal

discontinuities appear when adding or removing particles.

2.2.2 Surface splatting

Another early technique, introduced by Reeves [Ree83] involves rendering par-

ticles used to model fuzzy objects by additively blending them onto the image

plane. If there are enough particles, the individual particles will blend together

and create a smooth surface. The downside of this is that the technique re-

quires a large amount of particles to yield good results. The method also suffers

from problems with single particles being visible at the edges of the surface

[RB08]. This technique, also called splatting, is also used by Sims [Sim90]

where it is implemented to render waterfalls and snow storms etc. In more

recent work, billboards1 are used for surfacing data from SPH simulations by

Muller [MCG03], but in subsequent papers this approach seems to have been

abandoned [MSKG05].

Though it is easy to implement and gives plausible results for particle sim-

ulations [MCG03], it does not seem to have gained much traction in visual

effects. Instead, the method has had more success in the fields of visualization

of scanned point data and real-time applications [ZPvBG01][BHZK05][ALD06].

2.2.3 Level set based methods

Premoze et al. [PTB+03] introduced a technique based on the level set methods

[OS88] to construct surfaces for their particle fluid simulations. The level set

method is used to track the surface by first creating a force field from the particle

distribution for the first frame and then using a partial differential equation to

iteratively evolve the surface until it converges. For subsequent frames, the field

from the previous frame is used for initialization, allowing for continuity for the

surface between frames.

1Camera aligned 2D textures.

10

Page 16: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

The downside with this method is that is seems to be very slow compared to

other surface reconstruction algorithms. The authors also note that the method

has difficulties with creating sharp boundaries when colliding with objects or

other fluids.

2.2.4 Beyond metaballs

As described in section 2.2.1, metaballs create a field by summing the influ-

ence of nearby particles, explained by Equation 2.1. A problem with the early

smoothing kernels is that if many particles are grouped closely together, the

field will have very high values in that area, creating a large, unnatural blob.

In 2003, Muller et al. [MCG03] used the density from their SPH simulation to

normalize the particle contribution to the field, see Equation 2.4.

φ(x) =∑i

mi1

ρiW (x− xi, hi) (2.4)

Noting that this approach solves ”some scaling issues but does not signif-

icantly reduce the bumps on what should be flat surfaces”, Zhu and Bridson

[ZB05] proposed a different field creation method. It is based on the principle

that the influence field should be represented by the signed distance field for

lone particles. Given a single particle position x0 with radius r0, the resulting

field for a lone particle is:

φ(x) = ‖x− x0‖ − r0 (2.5)

With this as a base, they formulate a generalized solution for an arbitrary

number of particles by using a weighted average of the positions and radii:

φ(x) = ‖x− a(x)‖ − r(x) (2.6)

a(x) =∑i

w(x− xi, hi)xi/∑i

w(x− xi, hi) (2.7)

r(x) =∑i

w(x− xi, hi)ri/∑i

w(x− xi, hi) (2.8)

11

Page 17: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

where w is a smooth decaying weight function with finite influence radius hi.

It is easy to see that for particles with no neighbors within hi, the field is reduced

to the field of Equation 2.5. Because the approach uses average positions in the

particles, spurious blobs may appear in concave areas where the actual surface

positon does not correspond to the position given by Equation 2.6. Even so,

the authors note a big improvement in comparison to previous techniques.

Figure 2.2: Comparison between metaballs (red line), Zhu et. al [ZB05] (green

curve) and Adams et al. [APKG07] (blue line). Image from [APKG07].

Adams et. al [APKG07] improved the method of Zhu et al. [ZB05] by

tracking the particle-surface distance for each particle by using a signed distance

field. They used this field in each time step to adjust the per-particle distance

to the surface. By making use of this distance instead of the radius in Equation

2.8 and the weight function in Equation 2.9, the authors note an improvement

as can be seen in Figure 2.2.

w(r, h) =

(1− (‖r‖/h)2)3 if ‖r‖ < h

0 otherwise(2.9)

2.2.5 Anisotropic kernels

Basing their work on Muller et al. [MCG03] (Section 2.2.4), Yu and Turk

[YT] note two fundamental problems which make the surfaces of the original

approach look bumpy. The first problem is the nature of Lagrangian methods,

which makes the distribution of particles very irregular. Secondly, the authors

note that the spherical shape of the kernels used is not good for representing

the particle density distribution near a surface.

To solve the problem of irregularity, the authors suggest the use of a smooth-

ing step of the particle positions before the surface reconstruction. The smoothed

12

Page 18: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

particle centers are described by Equation 2.10, where w is a weighting function

with finite support which determines the influence from nearby particles and λ

a constant such that 0 < λ < 1.

x′i = (1− λ)xi + λ∑j

wijxj/∑j

wij (2.10)

This equation can be solved for all positions simultaneously using a sparse

matrix system and using an iterative solver which will be discussed more in

section 2.4.

To solve the second problem Yu and Turk [YT] proposed a generalization

of the smoothing kernels used by techniques in the previous sections. With the

exception of Premoze et al.[PTB+03], who use the tangential and perpendicular

velocity of the particles to scale their kernel, most other authors use spherically

shaped kernels. Instead of using a constant support radius h, as in Equation

2.3, Yu and Turk propose that a positive definite matrix G should be used.

This matrix is used to rotate and transform r to create a kernel that better

represents the density near a particle. The matrix G is determined by perform-

ing a weighted version of Principal Component Analysis [KC03] to the particle

positions within a neighborhood. This method constructs a weighted covariance

matrix, C, and then performs eigendecomposition where the eigenvectors gives

the principal axes and the eigenvalues gives the scaling factors with which to

scale the kernel.

Figure 2.3: Comparison between isotropic kernels and the kernel description

presented by Yu and Turk. Figure from [YT].

13

Page 19: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

The result of this is that the kernels that lie near a surface are flattened

in the normal direction and stretched in the tangential direction of the surface

while lone particles and particles within the volume will be spherical due to

the same distribution of density in every direction. A comparison between this

technique and Zhu et al. can be seen in Figure 2.3.

2.3 Surface extraction

The techniques described in the previous sections that are building a field, φ,

using particles need to use some kind of algorithm for turning the field into a

polygon mesh. The most popular way of doing this is to create an iso-surface

from a constant value in the field, which can be done in a number of different

variations.

2.3.1 Marching Cubes

In 1987, Lorensen et. al [LC87] introduced the Marching Cubes algorithm. The

authors divide the field, φ, into cubes, where each corner is assigned to be either

inside or outside the surface based on the iso-value selected. They then conclude

that a polygon can only pass through the cube in a finite number of ways. For

a cube, this number is 28 = 256, which can be reduced to 15 cases by using

symmetry. Three of these cases can be seen in Figure 2.4. By using a lookup-

table it is possible to quickly get the topological structure of a cube given the

inside/outside state of each corner. The final polygon can then be obtained by

creating vertices at interpolated positions along the cube’s edges. By doing this

(”marching”) for every cube in the field, a mesh can be constructed from the

resulting polygons.

The problem with the original approach is the possibility of ambiguities

in the cases causing non-manifold geometry [D88]. Solutions for this problem

has been introduced by, among others, [MSS94] [Che95] and [LLWVT03], who

introduces complementary cases to the original 15 to solve the ambiguities.

14

Page 20: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 2.4: Three of the fifteen base cases for Marching Cubes with the in-

side/outside state of each corner visualized.

2.3.2 Marching Tetrahedra

Another method solving the ambiguity of Marching Cubes was introduced by

Bloomenthal [Blo88] who proposed the use of tetrahedra instead of cubes. In

addition to removing ambiguities, there are only three unique ways in which

a polygon can pass through a tetrahedron, depicted in Figure 2.5, making im-

plementation even easier. As noted by [SML], though, this technique creates

surfaces with more triangles than Marching Cubes and artificial bumps in the

surface may appear due to the choice of orientation of the tetrahedra.

Figure 2.5: Shows the three ways that a polygon can pass through a tetrahedron.

2.3.3 Marching Tiles

In his thesis, Williams [Wil09] noted that any technique based on marching

structures with dihedral angles greater or equal to 90◦, such as Marching Cubes

or Marching Tetrahedron, can produce vertices with valence1 as low as four.

Such vertices represent poor sampling of the underlying domain and can also

1Number of incident edges on a vertex.

15

Page 21: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

cause problems when applying numerical operators such as smoothing on the

surface.

To solve this problem, Williams introduce a new space filling tile consisting

of 46 different tetrahedra. This tile, as seen in Figure 2.6, is constructed such

that each edge have at least five tetrahedra incident on it, thus guaranteeing a

valence of at least five. The tile can be marched in the same way as the other

techniques, but some more lookups need to be done in order to get the correct

tetrahedron for a given place in the field. The structure of the tile and how to

construct it can be found in [Wil09].

Figure 2.6: The space-filling tile used by Williams from two different angles. A

lookup table is used to find which tetrahedron to march depending on location

in the field.

2.4 Surface smoothing

Mesh smoothing has been an area of active research for a long time and has

many applications such as denoising data and design of high-quality surfaces

[BPK+07]. A popular method for smoothing meshes is called Laplacian smooth-

ing, where the vertices of a mesh are gradually moved in the direction of the

Laplacian of each vertex. Depending on the approximation used for the Lapla-

cian, different results can be achieved. [Bra04]

The concept of smoothing has been used in the creation of surfaces from

particles before. For example, Houdini [Sof10] can use a Gaussian filter to

smooth the field before creating the initial surface and also smooth the resulting

16

Page 22: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

mesh. While such techniques do create a smoother surface, the problem is that

pre-surface filtering may create unwanted artifacts as seen in Figure 2.7. Both

filtering and post-surface smoothing also suffers from the lack of connection

to the underlying particle representation and is thus not able to distinguish

between noise and actual small-scale structures.

Figure 2.7: Shows a surface created with the Houdini surfacer, with the under-

lying particles overlayed. Note that the surface does not coat all particles and

that there are artifacts in the form of polygons where there are no particles.

In his thesis, Williams [Wil09] presents an approach for creating smooth and

flat surfaces. Instead of using complex kernels to create the initial influence field

field, a distance field is used where each voxel holds the distance to the nearest

particle. The surface is created at router using Marching Tiles (Section 2.3.3).

The surface is then iteratively smoothed while applying constraints to make

sure that the surface stays true to the underlying particles. The basic idea of

the method can be seen in Figure 2.8.

The method poses the smoothing as a problem of constrained nonlinear opti-

mization and two different quadratic objective functions are used for measuring

the smoothness of the surface.

17

Page 23: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 2.8: The vertices of the constructed surface (red line) will be located

between router and rinner when the algorithm is finished. Figure from [Wil09].

2.4.1 Unweighted graph Laplacian

The first smoothing step used by Williams is based on an approximation of the

Laplacian that uses the inter-vertex connectivity of the mesh, introduced by

[Tau95]. The discrete Laplacian of a vertex xi is approximated by:

∇2xi =∑

j∈N1(i)

Wijxj (2.11)

where N1(i) is the inclusive 1-ring neighborhood of vertex i; that is, all

the vertices that share an edge with vertex i plus the vertex itself. Wij is set

to −1 and Wii is set to the number of elements in N1(i), creating a weighted

connectivity graph for all the vertices. It is now possible to form

∇2x = Ax (2.12)

where x is a Nx1 vector with all the vertices in the mesh and A is a NxN

sparse matrix containing the weights for each of the vertices. Iteratively mini-

mizing xTAx will now result in each vertex moving towards the center of all its

neighbors, making the triangles shaped improve.

18

Page 24: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

2.4.2 Weighted surface bilaplacian

In the second smoothing step, Williams minimizes xTBx, where B is a quadratic

approximation of the surface Bilaplacian:

B = WTD−1W (2.13)

In Equation 2.13, W is a NxN matrix, N being the number of vertices in the

mesh, where an element Wij is set to the sum of the cotangent of the opposite

angles for the two triangles incident on the edge connecting vertex i and vertex

j, see Figure 2.9 (a) [DC98]. D is a diagonal matrix where element Dii is the

area of all triangles such that vertex i is a part of that triangle, see Figure 2.9

(b). Minimizing the objective function will improve the curvature of the mesh,

creating a better looking surface.

Figure 2.9: (a) Shows the angles used for element Wij . (b) Shows all areas

incident on a vertex xi.

2.4.3 Solvers

When the A and B matrices have been retrieved it is possible to set up and

solve the smoothing optimization problem as a linear system. This system

can then be solved by using for example the Gauss-Seidel iterative method or

Successive Over-relaxation (used by Williams [Wil09]). Other researches have

19

Page 25: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

also used preconditioned bi-conjugate gradient for similar problems with good

results [DC98].

The constraints described below do make the optimization problem non-

linear, but they are handled separately and are not encoded into the matrices,

thus it is still possible to use linear solvers.

2.4.4 Constraints

The problem with just smoothing is that it can cause significant shrinkage of the

mesh. If enough iterations are made, the mesh will collapse to the barycenter of

the vertices. Williams solves this by applying a constraint to the vertices after

every smoothing step. By enforcing Equation 2.14 where x is a vertex and Pi a

particle, the resulting surface is not only a good representation of the underlying

particles, the author also notes that given particles arranged on a grid of size d,

a flat surface will be produced if Equation 2.15 is fulfilled.

rinner ≤ mini‖x−Pi‖ ≤ router (2.14)

r2outer >d2

2+ r2inner (2.15)

20

Page 26: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Chapter 3

GPU programming and

CUDA

Performance has always been a driving factor when hardware vendors create new

products, whether it is the transfer speed of USB ports, the response time of a

touch screen or the number of pixels in a digital camera. The clock frequency

of the central processing unit (CPU) of a computer is no different and there has

been an exponential increase in performance over the past decades.

The CPU is constructed for running a single thread of sequential instruc-

tions and works well for many tasks, but this is a major drawback when having

multiple tasks that can run independently of each other, e.g. heavy calcula-

tions. Although multi-core CPUs have been around since 2004, the underlying

hardware is still made for sequential computations. This is probably why re-

searchers without access to massively parallel supercomputers started looking

at General Purpose computing on the Graphical Processing Unit (GPGPU) for

doing their computations. The GPU is very different from the CPU in that it

is made with the intention of performing tasks in parallel, i.e. rendering pixels

on a screen, and can be a great deal faster for computations than the CPU if

the computations can be parallelized.

21

Page 27: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Before the introduction of the Geforce 8 GPU from NVIDIA, the only pos-

sibility of GPGPU was to make use of the graphic APIs such as OpenGL or

DirectX. This made GPGPU both hard to master and also put restrictions on

the type of calculations you could make due to the limitations of the APIs. In

November 2006, together with the Geforce 8 card, NVIDIA also introduced the

CUDA computing architecture which was made in order to solve these limita-

tions. Since then, other APIs have been released, most notably OpenCL which

is a cross-platform API for general GPU programming.

The GPU programming work of this thesis is done exclusively in CUDA,

and therefore the remainder of this chapter will focus on the architecture and

programming model of CUDA.

3.1 CUDA

CUDA, or Compute Unified Device Architecture, is as hinted above a general

purpose hardware and software architecture aimed at massively parallel and

computationally intensive applications.

Developed by NVIDIA, at the time of writing, CUDA is only supported by

GPUs from NVIDIA, such as the Geforce Series. These GPUs are viewed as

devices with additional processors to the main CPU, capable of executing tens

or hundreds of thousands of threads in parallel.

3.2 Architecture

As the underlying architecture is constructed for parallel programming, it is

necessary to have knowledge about it in order to fully utilize its potential.

Every CUDA enabled GPU contain a number of streaming multiprocessors,

each containing a number of cores. The multiprocessors have a Single Instruc-

tion, Multiple Data architecture (SIMD). This means that at a given time,

every core in the streaming multiprocessor executes the same instruction, but

on different data. [Cor10c]

22

Page 28: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Because not all code is suitable for running in parallel, a CUDA program

will typically consist of passages run on either the CPU (the host from now on)

or the GPU (the device from now on). The passages with no data parallelism

are implemented on the host using normal C or C++ code and the sections that

exhibit large amount of data parallelism are implemented using device code (see

Section 3.4).

Figure 3.1: An example of a grid of blocks. Image from NVIDIA CUDA Pro-

gramming Guide [Cor10c].

When a device function, called a kernel, is called, hundreds to tens of thou-

sands of lightweight threads are launched to operate on the data. These threads

are put into one of possibly many blocks, which in term form a two dimensional

grid. The size of the grid and the blocks can be specified by the programmer.

The grid and block indices are also known to each thread, making it possible

to form them to fit the data they operate on. Figure 3.1 shows an example of

a grid of blocks. The first kernel launches a grid of size (3,2) where each block

contains 15 threads (5,3).

23

Page 29: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

In order to allow for future changes in the hardware such as adding more

cores, CUDA is designed for transparent scalability. This makes it possible to

create applications that will run on future generations of GPUs without having

to change the code because the architecture supports such changes.

3.3 Memory model

There are several different memory types available to the CUDA programmer.

Each of these memories has different size, scope and lifetime. Figure 3.2 shows

the different memories and their relationships.

• Global memory is the main memory of the GPU and is the only one giving

read and write access to both the host and the device. It is accessible

from all threads but it is not cached, making access times high even if a

coalesced access pattern is followed.

• Constant memory is like global memory but it can only be written to from

the host. It is also cached, making it read from the slow device memory

only on a cache miss.

• Texture memory is a cached memory accessible to all threads with hard-

ware support for special operations such as linear and higher order y7interpolation.

• Registers are extremely fast device memories used to store local variables.

They are local to a thread and the memory will be released once the thread

is done executing.

• Local memory is a part of the global memory, but the memory is still local

to a thread. Since the on-chip memory is limited, the use of too many

registers will cause them to be placed in the local memory instead.

• Shared memory is a very fast on-chip memory which is shared between the

threads of a block, allowing for collaboration without having to resort to

the slow global memory. The lifetime of the shared memory is the lifetime

of a kernel.

24

Page 30: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 3.2: A diagram over the CUDA memories. Image from NVIDIA CUDA

Programming Guide [Cor10c].

3.4 Programming interface

There are currently two Application Programmeing Interfaces (APIs) for pro-

gramming CUDA.

The first one is the CUDA Driver API, which is a low-level interface that

gives good control over the device and has no dependency on the runtime library.

However, the driver API is known to have more verbose code and it is harder

to debug than C for CUDA, which is the second API. [Cor10a]

C for CUDA is a higher level interface implemented on top of the driver API.

It is built to be as an extension to normal C, exposing new syntax for creating

and launching kernels etc. It also contains a number of functions for data

transfer to and from the GPU, thread synchronization and providing OpenGL

interoperability to name a few.

The GPU implementation for this thesis is all done using C for CUDA.

25

Page 31: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

3.4.1 Optimizations

Simply getting a CUDA program to compile is not enough to get great perfor-

mance. There are a number of things that need to be considered in order to

achieve fast programs.

• The highest priority should always be to maximize the parts of the pro-

gram that can run in parallel. By choosing correct algorithms and execu-

tion configuration for each kernel launch (maximizing the resource usage),

extreme speedups can be achieved.

• Memory optimization should be done by first minimizing the data trans-

fered between the host and the device. Such transfers have low bandwidth

and are very slow compared to transfers between other memories. Access

of global memory should always be coalesced and access of global memory

from kernels should also be as low as possible, letting kernels within a

block use the shared memory instead.

• Control flow instructions such as if- and for-statements can have a big

impact on performance if the threads within a warp1 are not running

the same code. If one thread in a warp takes a different route in an if-

statement which takes longer time to execute, all the other threads in the

same warp will wait for that thread to finish. By grouping threads or the

data wisely, such clogs can be avoided.

• Instruction optimization can be made by using alternative versions of

heavy arithmetic functions optimized for speed. These functions often

has a tradeoff when it comes to precision though, so they should only be

used when they do not affect the quality of the result.

More in-depth information about CUDA and many more optimization strate-

gies can be found in the CUDA Programming Guide [Cor10c], the Best Practices

Guide [Cor10a], or in the two introductory books Programming Massively Par-

1A logical grouping of threads

26

Page 32: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

allel Processors by Kirk and Hwu [KmWH10] or CUDA by Example by Sanders

and Kandrot [SK10].

27

Page 33: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Chapter 4

Implementation

The implementation made in this thesis closely follows the work of Williams

[Wil09], which seemed to yield very good results while at the same time having

many passages allowing for massively parallel operations on the GPU. A CPU

version of the surfacer was constructed as well. This was done because not all

artists had good enough graphics cards to handle huge amount of data and also

because the surface reconstruction should be able to run on the rendering farm,

currently only consisting of CPUs. Though different algorithms are used for

some of the operations, the end result should be the same.

Figure 4.1 shows an overview of the data flow and the different operations

performed from particles to a final surface for a normal run of the algorithm. The

following sections will explain these steps in more depth after a data structure

for fast searching is described.

4.1 Uniform hashed grid

In the field creation, the constraint and the attribute transfer steps of Figure

4.1, a spatial search is needed to quickly find particles near a certain point, be

it a position in the field or a vertex on the surface.

For this, an implementation based on the work of Le Grand [LG07] is used.

In this method space is divided into a fixed sized grid of size (dx, dy, dz) with

28

Page 34: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 4.1: All the steps in the implementation of the surfacing algorithm.

cells of equal size, (cx, cy, cz). Then a hash value according to Equation 4.1

is computed for every particle position, P, which is stored together with the

particle IDs. The particles are then put into a one-dimensional array which is

sorted using a parallel radix sort [SHG09] such that all particles within a cell

are adjacent to each other. Two additional arrays are then created to store the

start and end indices of each cell in the big array, allowing for fast lookup of

nearby particles for positions that hash to the same value.

Phash = mdxdy + ldx + k (4.1)

k = bPx

cxc mod dx l = bPy

cyc mod dy m = bPz

czc mod dz (4.2)

By choosing an appropriate size for the cells, the closest particle to a point

can be found by looking in the cell that the particle position is hashed to, or in

one of the adjacent cells.

29

Page 35: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

The uniform hashed grid is fast, light-weight in terms of memory and easy

to implement. A sample implementation of this structure can be found in the

CUDA SDK [SG08], though it is heavily coupled with the specific implementa-

tion.

4.2 Distance field

The CPU version creates the distance field by rasterizing the particles on the

underlying field and then checking nearby voxels inside the influence radius for

their distance value and updating them if necessary. Since it is not required to

store distance in voxels that is not close to a surface, a sparse field from the

Field3D library [Wre10] was used for storage.

The rasterization process causes race conditions when run in parallel if

atomic operations1 are not used. These operations were not widely supported

on the graphic cards installed at Imageworks, so a different technique was used

for the GPU version. A sparse data structure was constructed to reduce the

memory footprint, much like the sparse field above. The first step was to cal-

culate which of the sparse blocks that would need allocation. Blocks that were

too far from any particles to contribute to the surface could simply be skipped.

Then for all the allocated blocks, each voxel position was visited to search for

the distance to the nearest particle by using the uniform hash grid. The size

of the hashed grid was such that if no particle was found within the cell or its

neighbors, the distance could safely be set to ∞.

4.3 Mesh creation

For the mesh creation, Marching Tiles (Section 2.3.3) was implemented. Im-

ageworks already had an implementation of Marching Cubes both for the CPU

and GPU, so they were integrated into the tool as well.

Marching Tiles was done on the GPU by first computing which of the tetra-

hedra in the tiles would actually contain triangles. This was done so there would

1A set of intructions that appear to be run instantaneously to the rest of the system.

30

Page 36: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

be no need to march through empty tetrahedra and also to know how many tri-

angles the final mesh would have so proper allocations could be made. Then the

actual marching of the tetrahedra was made and all the edges in the field that

would create a vertex was saved together with the two end values. The reason

for this is that simply creating the triangles directly would create just triangles

and not a mesh. Because of this it is necessary to weld the vertices or in this

case the edges together. The benefit with welding edges instead of the final

vertex positions is that the edges have discrete coordinates while the vertices

have floating point coordinates which may be subject to rounding errors due to

interpolation direction etc. The edge weld was constructed by modifying the

algorithm from [HB10] and then the final vertex values were retrieved, creating

a mesh with correct connectivity information.

4.4 Smoothing

As can be seen in Figure 4.1, the smoothing is first done for xTAx and then

two times for xTBx, one with constraints and the second without. The first two

steps move the vertices to good positions, while the unconstrained step improves

the smoothness of the surface normals while not moving the vertices too much

[Wil09]. The result of different number of smoothing steps minimizing xTAx

can be seen in Figure 4.2.

The matrices processed in the implementation are all very sparse. Each row

in a matrix will have as many elements as there are vertices in the mesh but

only an average of seven elements will be non-zero. For memory reasons, the

matrices in the implementation were all represented using a sparse matrix stor-

age scheme called Compressed Row Storage. While this method makes element

access slower, the memory consumption is extremely small in comparison to a

dense matrix.

31

Page 37: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 4.2: Different number of smoothing steps for xTAx with constraints.

(top) Initial surface. (middle) 2 steps, (bottom) 6 steps

4.4.1 Finding matrices

The process of computing the A and B matrices were implemented on the CPU

because of the serial nature of the procedure and a lack of time to find appro-

priate parallel versions. A is found by creating an array of associative arrays

and looping over the connectivity list of the mesh, adding the vertices of each

triangle to the appropriate associative array. The sparse matrix is then con-

structed by extracting the values from the arrays. The matrix B is constructed

in a similar fashion.

4.4.2 Solvers

For the actual solving of the linear systems, a Jacobi solver adapted for sparse

data was used. Even though this method might converge slower than Gauss-

Seidel or Successive Over-Relaxation, the fact that it is very easy to parallelize

32

Page 38: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

and that the parallel version is much faster makes up for this, see Section 5.5.

4.5 Constraints

The shrinking effect of the smoothing which can be seen as a negative effect in

some cases is what makes surfaces flat in this implementation. The constraints

make sure that the vertices are allowed to move to better positions while still

being true to the particle positions. The constraints are enforced after each

smoothing step by simply using a hashed grid to look up the closest particle to

each vertex and move the vertex along the radial vector if found to be inside or

outside the stipulated radii.

4.6 Erosion

In an attempt to be able to represent very thin structures without using huge

number of particles with a small radii, an optional erosion step was added,

where each vertex is moved along its negative normal direction. The result of

this operation can be seen in Figure 4.3.

While easy to implement and straight forward, this method very easily cre-

ates self-intersecting triangles if a large value is used. A surface that looks good

in one frame might also have many self-intersections in the next one if large

values are used.

Figure 4.3: The effect of eroding a surface along the vertex normals.

33

Page 39: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

4.7 Attribute transfer

As a last optional step, the ability to transfer attributes from the particles to

the vertices of the mesh was added. Such attributes can for example be color,

see Figure 5.3, or velocity which can be used for motion blur. This was done

by specifying an influence radius for the vertices and then use that as the size

of the spatial hash to find all relevant particles within the radius and using a

weighted average to get the final attribute value.

4.8 Houdini

The final program was designed to be independent on any particular software

package taking only a list of particles and a structure containing the surfacing

parameters. In the prototype made, the functionality of the surfacer was ex-

posed by constructing a Surface Operator (SOP) for Houdini using the Houdini

Development Kit. A screenshot of the SOP can be seen in Figure 4.4. The

parameters seen in the figure masks many of the real ones for ease of use for the

artist.

Figure 4.4: A screenshot of the Houdini operator created for the particle surface

reconstruction.

34

Page 40: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Chapter 5

Results

5.1 Computer specifications

All the work in this thesis, including the timing experiments, was done on an

octacore 2.5GHz computer with 24Gb of RAM and a Quadro FX 3700 graphics

card with 512Mb memory. Note that the CPU version was not threaded, thus

only making use of one core.

5.2 Example: Fountain

In Figure 5.1, a sequence from a SPH simulation of a fountain pouring a stream

of water can be seen. The simulation had approximately 120,000 particles and

was surfaced on a 325x600x100 grid. The surfacing took about two seconds on

the GPU, and 30 seconds if done on the CPU in production quality.

35

Page 41: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 5.1: Frames from the surfacing of a fountain simulation.

36

Page 42: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

5.3 Example: Splash

Figure 5.2 shows the surfacing of a splash where an object collides with a big

pool of particles. The simulation as well as the surfacing was done in three

different passes in order to add more small detail to the final sequence. Note

that this makes the small scale particles unable to interact with the pool in the

simulation.

Figure 5.2: An object crashes into a big pool of particles.

37

Page 43: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

5.4 Example: Color and velocity transfer

The ability to transfer attributes such as color and velocity can be seen in Figure

5.3. A cube acts as source for all the particles which are colored based on their

coordinates.

Figure 5.3: Example of color transfer from particles onto geometry.

38

Page 44: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

5.5 Performance

The tests in the following sections were run ten times each and the result is

the average of these runs. The tests were run on the same dataset, which was

475,000 particles that was sampled and surfaced on a 500x650x350 field. Figure

5.4 shows a bar chart over the time for the different parts of the surfacing, both

for the CPU and the GPU version. Figure 5.5 shows the percentages of the

processing times for the GPU version. Note that these results are only an ex-

ample and that results may vary quite much depending on particle distribution

and field size. However, it can be seen that the GPU version is significantly

faster where full utilization of the GPU can be used. It can also be seen that

overhead and the sections that are done on the CPU takes up more than 70%

of the computation time for the GPU version.

Figure 5.4: Illustrates the execution time for different parts of the program on

both the CPU and the GPU.

39

Page 45: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Figure 5.5: Shows how the processing time is divided between the different parts

of the GPU program.

40

Page 46: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Chapter 6

Conclusion

The goal of this thesis was to evaluate the field of reconstructing surfaces from

particle data and to create a prototype for this using the GPU for integration

into Houdini. The technique implemented should be faster than the current

technique and also be easier for artists to work with. Furthermore it should be

easier to represent flat surfaces without introducing artifacts.

Because the method presented in this thesis involves more steps and is more

computationally expensive than the native Houdini version (which basically

performs the two first steps of Figure 4.1) it may not be fair to compare them

straight off. The general feeling when working with the GPU version is that it

is still faster than the native application. When not doing any smoothing or

constraints, the GPU version is much faster.

The usability of the plugin created is not easy to estimate because the pro-

totype was not rolled out to be tested by artist until one week prior to the

completion of the thesis. The feedback receieved, though, seemed positive and

the SOP does hide a lot of unnecessary options to the end user, instead exposing

a combination of them as for example ”Smoothness” or ”Mesh detail”.

The technique used for surfacing and smoothing based on the work of Williams

[Wil09] creates a final surface that is globally smooth and also flat under certain

conditions (Equation 2.15). The technique also creates a surface that stays true

to the particles positions. The smoothing can however create collapsing features

41

Page 47: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

under certain conditions when the vertices snaps to the ”wrong” particles due

to large smoothing step sizes. The method is also sensitive to changes in the

field, something that becomes a problem when surfacing particles having almost

no velocity. Many of the simulations used for testing had particles that were

supposed to be settled but still oscillated a little bit, which caused the resulting

surface to flicker due to differences in the field from one frame to the next. This

becomes extra evident when using reflective materials for the surface.

6.1 Future work

The paper by Yu et al. [YT] has some interesting ideas and the results are

visually very pleasing. The implementation of anisotropic kernels should be the

first priority for further work. Yu et al. also acknowledges that their solution

would benefit from the work of Williams [Wil09], and a combined solution would

be good to try. The changes needed would be the creation of the field as well

as the constraint enforcement, where a pure distance to the nearest particle can

no longer be used. Instead the constraints need to be based on the individual

elliptical kernel for each particle.

Another thing that the evaluating artists seemed to want was the possibility

to smooth parts of the mesh differently from others. This could easily be done

by either creating a field that scales the factors in the matrices or it could

automatically be done based on properties such as velocity. Using velocity to

scale the elements in the matrices, it would be possible to smooth areas where

particles are not moving heavier and smooth less where a lot is going on.

The smoothing of the particle positions mentioned in Yu et al. [YT] should

help with the temporal flickering. This smoothing could possibly also be weighted

by velocity, letting areas of particles with low velocity be more dependent on

neighbors than high velocity particles, where flickering does not pose a problem.

As can be seen in section 5.5 there are parts of the program that is over 100

times faster than the corresponding CPU part, but the overall performance is

clogged by bottlenecks. One of the big bottlenecks is the computations that are

42

Page 48: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

still done on the CPU.

One possible optimization could be to reuse the B matrix in the second step

instead of recomputing it, which will gain time but possibly not affect the final

surface negatively. Multithreading the CPU parts would also gain the GPU

version, but if possible it would be better to try to move all of the computations

to the GPU. This would not only give speedup in the computations, the current

overhead in moving data back and forth would be removed as well. CUDA is

also moving very fast and new libraries and features are coming very rapidly,

many of which might be useful, for example CUSPARSE [Cor10b] which is a

NVIDIA-developed library for handling sparse matrices added in September

2010.

43

Page 49: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Acknowledgements

I would like to thank my supervisor Magnus Wrenninge for all his help and

invaluable input.

I would also like to thank the talented people at Sony Pictures Imageworks

with whom I have had the fortune to interact, especially Chris Allen, Sosh

Mirsepassi and Viktor Lundqvist.

Finally I would like to thank Jonas Unger for helping me getting into the

IPAX program and Swedbank-Sparbanksstiftelsen Alfa as well as Professor An-

ders Ynnerman’s Foundation for the scholarships that I received.

44

Page 50: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Bibliography

[ALD06] Bart Adams, Toon Lenaerts, and Philip Dutr. Particle Splatting:

Interactive Rendering of Particle-Based Simulation Data, 2006.

[APKG07] Bart Adams, Mark Pauly, Richard Keiser, and Leonidas J. Guibas.

Adaptively sampled particle fluids. In SIGGRAPH ’07: ACM SIG-

GRAPH 2007 papers, page 48, New York, NY, USA, 2007. ACM.

[BHZK05] Mario Botsch, Alexander Hornung, Matthias Zwicker, and Leif

Kobbelt. High-quality surface splatting on today’s GPUs. Proceed-

ings Eurographics/IEEE VGTC Symposium Point-Based Graphics,

0:17–141, 2005.

[Bli82] James F. Blinn. A Generalization of Algebraic Surface Drawing.

ACM Trans. Graph., 1(3):235–256, 1982.

[Blo88] Jules Bloomenthal. Polygonization of implicit surfaces. Computer

Aided Geometric Design, 5(4):341 – 355, 1988.

[BPK+07] Mario Botsch, Mark Pauly, Leif Kobbelt, Pierre Alliez, Bruno

Levy, Stephan Bischoff, and Christian Rossl. Geometric model-

ing based on polygonal meshes. In SIGGRAPH ’07: ACM SIG-

GRAPH 2007 courses, page 1, New York, NY, USA, 2007. ACM.

[Bra04] Nicholas Bray. Notes on mesh smoothing, 2004.

[Bri08] Robert Bridson. Fluid simulation. A K Peters, 2008.

45

Page 51: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

[Che95] Evgeni V. Chernyaev. Marching Cubes 33: Construction of Topo-

logically Correct Isosurfaces. Technical report, Institute for High

Energy Physics, 142284, Protvino, Moscow Region, Russia, 1995.

[Cor10a] NVIDIA Corporation. CUDA C Best Practices Guide, Version 3.2,

2010. [Online; accessed 18-September-2010].

[Cor10b] NVIDIA Corporation. CUDA CUSPARSE Library, 2010.

[Cor10c] NVIDIA Corporation. Nvidia cuda c programming guide, version

3.2, 2010. [Online; accessed 18-September-2010].

[D88] Martin J. Durst. Re: additional reference to ”marching cubes”.

SIGGRAPH Comput. Graph., 22(5):243, 1988.

[DC98] Mathieu Desbrun and Marie-Paule Cani. Active Implicit Sur-

face for Animation. In Wayne A. Davis, Kellogg S. Booth, and

Alain Fournier, editors, Graphics Interface 1998, June, 1998, pages

143–150, Vancouver, BC, Canada, June 1998. Canadian Human-

Computer Communications Society. Published under the name

Marie-Paule Cani-Gascuel.

[EFFM02] Douglas Enright, Ronald Fedkiw, Joel Ferziger, and Ian Mitchell.

A hybrid particle level set method for improved interface capturing.

J. Comput. Phys., 183(1):83–116, 2002.

[HB10] Jared Hoberock and Nathan Bell. Thrust: Weld vertices

example. http://code.google.com/p/thrust/source/browse/

examples/, October 2010.

[KC03] Yehuda Koren and Liran Carmel. Visualization of Labeled Data

Using Linear Transformations. Information Visualization, IEEE

Symposium on, 0:16, 2003.

[KmWH10] David B. Kirk and Wen mei W. Hwu. Programming Massively

Parallel Processors: A Hands-on approach. Addison-Wesley Pro-

fessional, 1 edition, 2010.

46

Page 52: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

[KVH84] James T. Kajiya and Brian P Von Herzen. Ray tracing volume

densities. SIGGRAPH Comput. Graph., 18(3):165–174, 1984.

[LC87] William E. Lorensen and Harvey E. Cline. Marching cubes: A high

resolution 3D surface construction algorithm. In SIGGRAPH ’87:

Proceedings of the 14th annual conference on Computer graphics

and interactive techniques, pages 163–169, New York, NY, USA,

1987. ACM.

[LG07] Scott Le Grand. GPU Gems 3 - Broad-Phase Collision Detection

with CUDA, chapter 32, pages 697–722. Addison-Wesley, 2007.

[LGF04] Frank Losasso, Frederic Gibou, and Ron Fedkiw. Simulating water

and smoke with an octree data structure. In SIGGRAPH ’04:

ACM SIGGRAPH 2004 Papers, pages 457–462, New York, NY,

USA, 2004. ACM.

[LLWVT03] Thomas Lewiner, Hlio Lopes, Antnio Wilson Vieira, and Geovan

Tavares. Efficient implementation of marching cubes cases with

topological guarantees. Journal of Graphics Tools, 8(2):38366, de-

cember 2003.

[Lun09] Viktor Lundqvist. A smoothed particle hydrodynamic simulation

utilizing the parallel processing capabilites of the GPUs. Master’s

thesis, Linkping University, Department of Science and Technol-

ogy, 2009.

[MCG03] Matthias Muller, David Charypar, and Markus Gross. Particle-

based fluid simulation for interactive applications. In SCA ’03:

Proceedings of the 2003 ACM SIGGRAPH/Eurographics sym-

posium on Computer animation, pages 154–159, Aire-la-Ville,

Switzerland, Switzerland, 2003. Eurographics Association.

[MSKG05] Matthias Muller, Barbara Solenthaler, Richard Keiser, and Markus

Gross. Particle-based fluid-fluid interaction. In SCA ’05: Proceed-

ings of the 2005 ACM SIGGRAPH/Eurographics symposium on

47

Page 53: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

Computer animation, pages 237–244, New York, NY, USA, 2005.

ACM.

[MSS94] Claudio Montani, Riccardo Scateni, and Roberto Scopigno. A

modified look-up table for implicit disambiguation of Marching

Cubes. The Visual Computer, 10(6):353–355, 1994.

[OF03] Stanley Osher and Ronald Fedkiw. Level Set Methods and Dynamic

Implicit Surfaces. Springer, 2003.

[OS88] Stanley Osher and James A. Sethian. Fronts propagating with

curvature-dependent speed: Algorithms based on Hamilton-Jacobi

formulations. J. Comput. Phys., 79(1):12–49, 1988.

[Par82] Paramount. Star Trek II: The Wrath of Khan (movie), 1982.

[PTB+03] Simon Premoze, Tolga Tasdizen, James Bigler, Aaron Lefohn, and

Ross T. Whitaker. Particle-Based Simulation of Fluids. Computer

Graphics Forum, 22(3):401–410, 2003.

[RB08] Ilya D. Rosenberg and Ken Birdwell. Real-time particle isosurface

extraction. In I3D ’08: Proceedings of the 2008 symposium on

Interactive 3D graphics and games, pages 35–43, New York, NY,

USA, 2008. ACM.

[Ree83] William T. Reeves. Particle systems—a technique for modeling

a class of fuzzy objects. In SIGGRAPH ’83: Proceedings of the

10th annual conference on Computer graphics and interactive tech-

niques, pages 359–375, New York, NY, USA, 1983. ACM.

[SG08] NVIDIA Corporation Simon Green. CUDA Particles, 2008.

[SHG09] Nadathur Satish, Mark Harris, and Michael Garland. Designing

efficient sorting algorithms for manycore GPUs. In IPDPS ’09:

Proceedings of the 2009 IEEE International Symposium on Par-

allel&Distributed Processing, pages 1–10, Washington, DC, USA,

2009. IEEE Computer Society.

48

Page 54: GPU Accelerated Surface Reconstruction from …liu.diva-portal.org/smash/get/diva2:625594/FULLTEXT01.pdfDepartment of Science and Technology Institutionen för teknik och naturvetenskap

[Sim90] Karl Sims. Particle Animation and Rendering Using Data Parallel

Computation. In Computer Graphics, pages 405–413, 1990.

[SK10] Jason Sanders and Edward Kandrot. CUDA by Example: An In-

troduction to General-Purpose GPU Programming. Morgan Kauf-

mann Publishers, 2010.

[SML] Will Schroeder, Ken Martin, and Bill Lorensen. The Visualization

Toolkit, Third Edition. Kitware Inc.

[Sof10] Side Effects Software. Houdini. http://www.sidefx.com/, Octo-

ber 2010.

[Tau95] Gabriel Taubin. A signal processing approach to fair surface design.

In SIGGRAPH ’95: Proceedings of the 22nd annual conference on

Computer graphics and interactive techniques, pages 351–358, New

York, NY, USA, 1995. ACM.

[Wil09] Brent Warren Williams. Fluid Surface Reconstruction from Parti-

cles. Master’s thesis, The University of British Columbia, 2009.

[Wre10] Magnus Wrenninge. Field3D: A file format and data structures for

storing 3D voxel and simulation data, 2010. Open source project

at Sony Pictures Imageworks.

[YT] Jihun Yu and Greg Turk. Reconstructing Surfaces of Particle-

Based Fluids Using Anisotropic Kernels. In 2010 ACM SIG-

GRAPH/Eurographics symposium on Computer animation.

[ZB05] Yongning Zhu and Robert Bridson. Animating sand as a fluid. In

SIGGRAPH ’05: ACM SIGGRAPH 2005 Papers, pages 965–972,

New York, NY, USA, 2005. ACM.

[ZPvBG01] Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus

Gross. Surface splatting. In SIGGRAPH ’01: Proceedings of the

28th annual conference on Computer graphics and interactive tech-

niques, pages 371–378, New York, NY, USA, 2001. ACM.

49