Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings Universitet SE-601 74 Norrköping, Sweden 601 74 Norrköping
LiU-ITN-TEK-A--10/065--SE
GPU Accelerated SurfaceReconstruction from Particles
Erik Edespong
2010-10-28
LiU-ITN-TEK-A--10/065--SE
GPU Accelerated SurfaceReconstruction from Particles
Examensarbete utfört i medieteknikvid Tekniska Högskolan vid
Linköpings universitet
Erik Edespong
Handledare Magnus WrenningeExaminator Jonas Unger
Norrköping 2010-10-28
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.
The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.
According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.
For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/
© Erik Edespong
GPU Accelerated Surface
Reconstruction from Particles
Erik Edespong
October 31, 2010
Abstract
Realistic fluid effects, such as smoke and water has been pursued by the vi-
sual effects industry for a long time. In recent years, particle simulations have
gained a lot of popularity for achieving such effects. One problem noted by re-
searchers has been the difficulty of generating surfaces from the particles. This
thesis investigates current techniques for particle surface reconstruction. In ad-
dition to this, a GPU-based implementation using constrained mesh smoothing
is described. The result is globally smooth surfaces which closely follows the
distribution of the particles, though some problems are still apparent. The per-
formance of the algortihm is approximately an order of magnitude faster than
its CPU counterpart, but is clogged by bottlenecks in sections still runnning on
the CPU.
Contents
1 Introduction 3
1.1 Method summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Structure of the report . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background survey 6
2.1 Fluid simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Particle surfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Metaballs . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Surface splatting . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Level set based methods . . . . . . . . . . . . . . . . . . . 10
2.2.4 Beyond metaballs . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.5 Anisotropic kernels . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Surface extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Marching Cubes . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Marching Tetrahedra . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Marching Tiles . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Surface smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Unweighted graph Laplacian . . . . . . . . . . . . . . . . 18
2.4.2 Weighted surface bilaplacian . . . . . . . . . . . . . . . . 19
2.4.3 Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1
3 GPU programming and CUDA 21
3.1 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Memory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Programming interface . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Implementation 28
4.1 Uniform hashed grid . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Distance field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Mesh creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4.1 Finding matrices . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.2 Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.7 Attribute transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.8 Houdini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Results 35
5.1 Computer specifications . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Example: Fountain . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Example: Splash . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Example: Color and velocity transfer . . . . . . . . . . . . . . . . 38
5.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Conclusion 41
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Bibliography 44
2
Chapter 1
Introduction
In the realm of visual effects, the creation of natural phenomena such as water,
smoke and fire have been pursued since the early 80’s with the release of the
movie Star Trek II: The Wrath of Khan [Par82][Ree83]. Since such phenomena
are very complex, manual techniques such as key-framing is not a viable option.
Instead these types of effects are done by setting up physical models for the
movement of the fluids and simulating the outcome. Because people are used to
seeing, for example, water in real life, it is very important that the water acts
and looks as expected if it is to be perceived as real, e.g. water is expected to
have a certain viscosity and if it runs too slowly it will be easily spotted.
Fluid simulation approaches are usually divided into two main categories:
grid-based (Eulerian) approaches and particle-based (Lagrangian) approaches.
Both have their own strengths and weaknesses which will be discussed in Chap-
ter 2.
As one of the world’s top visual effects studios, Sony Pictures Imageworks has
been working with creating natural phenomena for a long time. The company
has proprietary software for both of the simulation approaches mentioned above,
the latest addition being a Smoothed Particle Hydrodynamic (SPH) library
by Lundqvist in 2009 [Lun09]. This library was integrated with Side Effects’
Houdini [Sof10] for use by the artists in the production pipeline.
One thing missing from Lundqvist’s library is the possibilty to convert the
3
particle representation of the fluid to an actual renderable surface. For this
task, a tool that shipped with Houdini is used. The built-in tool does however
have a number of shortcomings which makes it unfavorable to use. First of
all it is slow when used with large numbers of particles over large simulation
domains; it often take more time to reconstruct a surface from the particles
than it does to simulate their movement. Secondly, it is difficult to create
smooth surfaces without significantly reducing the small-scale detail as well as
introducing visually disturbing artifacts to the surface. Finally, the tool is quite
hard to use for artists without a technical background, with many non-intuitive
settings that need to be tuned to get a resonable surface.
The objective of this thesis is to investigate the creation of surfaces from
particle data and the capturing of details of small-scale fluid simulations such
as objects falling into water and creating splashes. The goal is to create a
prototype for this which is going to be integrated into Houdini. Moreover,
the prototype should utilize the parallel capabilities of today’s GPUs for speed
benefits. In addition to this, a CPU version should also be created for reference
and for use on computers without sufficient GPU support.
1.1 Method summary
The method used in the implementation of this thesis is based on the work of
Williams [Wil09]. The method creates an initial surface from a field where each
voxel holds the distance to the nearest particle. The surface is created by using a
new approach based on Marching Tetrahedra called Marching Tiles. The initial
surface is quite crude, so a number of iterative smoothing steps are applied in
order to make the surface look good. Normally the problem with smoothing are
collapsing surfaces, but the approach applies constraints on the surface after
each smoothing step in order to make sure that the surface still represents the
underlying particle distribution.
4
1.2 Structure of the report
Chapter 2 introduces the necessary background for Eulerian and Lagrangian
methods and surveys past as well as current techniques for particle surface
reconstruction. In Chapter 3, GPU computing and the CUDA computing ar-
chitecture are presented together with considerations for performance benefits.
Implementation details for the necessary steps of particle surface reconstruction
using the GPU are presented in Chapter 4. Chapter 5 presents the results of
the thesis with both images as well as timing comparisons between the CPU
and GPU versions. Finally, Chapter 6 features a conclusion of the results and
also contains suggestions of further improvements which can be done.
5
Chapter 2
Background survey
This chapter begins with a description of the two main fluid simulation ap-
proaches. Following this, different methods for creating surfaces from particle
data are presented, with a focus on methods that samples the particles on an
underlying field. Then, different ways of extracting surfaces from such fields
are described. Finally, the constrained surface smoothing method used in the
implementation of this thesis is described.
2.1 Fluid simulation
The goal when simulating natural phenomena is to solve equations which de-
scribe the motion of the fluid. A popular set of equations used are the Navier-
Stokes equations formulated by Claude-Loius Navier in 1822 and by George
Gabriel Stokes 1845. These equations are used in a wide number of areas and
was first used in computer graphics by Kajiya et al. in 1984 [KVH84]. The
difference between computer graphics and many other areas, such as scientific
simulation, is that the result does not need to be accurate, or even remotely
correct. As long as the result looks realistic to the human eye, it is acceptable
and thus a lot of approximations and simplifications can be made. As noted in
Chapter 1, there exists two main approaches to fluid simulation which are used
for visual effects: Eulerian and Lagrangian, both of which are described below.
6
Eulerian approaches are characterized by a simulation domain that is dis-
cretized into a static three dimensional field of voxels. The voxels in the field
each hold physical properties such as velocity, density and temperature and the
values at each voxel are calculated by solving the governing equations in every
time step. The field offers fast access to the data for any position within the field
by means of interpolation from nearby voxels that can be accessed instantly.
Although grid-based techniques have been the most common way of simu-
lating fluids [Bri08], they have some inherent problems. The finite and fixed
nature of the field used for computation can make the representation of fine
details difficult. This can of course be solved by increasing the size of the field,
however at great cost to computation and memory consumption. There has also
been use of data structures such as octrees by Losasso [LGF04] et al. for this
problem as well as hybrid methods such as Particle Level Sets. The Particle
Level Set method, introduced by Enright et al. [EFFM02], spawns particles at
strategic places in the field and then let these be moved at each time step. The
values of the field are then corrected by the particles if they differ too much.
Still, even with more advanced methods, the representation of very fine detail
is difficult due to the fixed nature of the grid.
The surface, or interface, of a fluid is implicit in Eulerian approaches. It is
often represented by a level set φ where each point in the field holds the signed
distance to the interface. Values of φ < 0 are considered to be inside the fluid
while values of φ > 0 are outside. Implicitly, this gives that the surface of the
fluid lies at φ = 0 [OF03].
Retrieving the surface for rendering is trivial once the level set has been
retrieved by using a Marching algorithm such as Marching Cubes (see section
2.3) for extraction of an iso-surface.
Whereas Eulerian approaches simulate the fluid at fixed point in space, La-
grangian techniques use a cloud of particles to move with the fluid and to rep-
resent it. One of the big advantages with this is that the simulation domain is
unrestricted and not bound by any grid, allowing each particle to move freely.
7
While grid-based methods need to calculate values at each point in the field,
even if there is no fluid in the voxel, particle-based approaches only need to do
calculations where there are particles, since this is where the fluid is located.
Another advantage Lagrangian methods have is that each particle has a
mass, making problems with mass and volume loss non-existent as long as the
number of particles is kept constant.
One of the disadvantages with Lagrangian methods noted by researchers is
the difficutly of extracting a high-quality renderable surface [EFFM02] [PTB+03].
With the increasing interest of Lagrangian methods, especially Smoothed Parti-
cle Hydrodynamics, the aspect of surface reconstruction from particles has also
gained some attention and is the main focus of this thesis.
It should be noted that the surfacing techniques below are not restricted
to any particular simulation technique. Any algorithm that simulates particles
and has the properties needed by the surfacing algorithms can be used. SPH is
used in the examples and results because it is the main system targeted during
the work of this thesis.
2.2 Particle surfacing
2.2.1 Metaballs
The field of reconstructing surfaces from particles got one of its first contribution
from Blinn [Bli82] when he introduced blobbies, or metaballs, in 1982. The
approach creates an influence field from the particles. Each point in the field
evaluates the positions of the particles with a function and sums the results to
get a final influence value. By doing this it is possible to get a smooth blending
between particles, see Figure 2.1. A general description of how metaballs are
constructed can be seen in Equation 2.1, where xi is the position of the particle,
x is the voxel position in the field and W is a function that describes the
influence, also called a smoothing kernel.
The Gaussian function seen in Equation 2.2, with a and b being constants,
is the influence function that Blinn used, but any function that is smooth and
8
goes to zero at a maximum value can be used.
φ(x) =∑i
W (x− xi) (2.1)
W (r) = ae−b‖r‖ (2.2)
In the case of functions like that of Equation 2.2, it can be seen that the
influence will drop off very rapidly for increasing distances between x and xi,
but never reaches zero. Because accuracy is not of critical importance, it is wise
to specify a radius h, outside of which the particle influence is not considered.
Equation 2.3 shows another smoothing kernel used by Rosenberg et al. [RB08].
This kernel will be computationally faster than Equation 2.2 partly because
of the lack of square roots and the exponential, but mainly because particles
outside of h do not need to be processed.
W (r, h) =
(‖r‖/√
2h)4 − (‖r‖/√
2h)2 + 0.25 if ‖r‖ < h
0 otherwise(2.3)
Once the field is computed, it is very easy to extract the isosurface using
one of the methods described in Section 2.3.
Though they are easy to implement, metaballs have a number of disadvan-
tages as well. The surface extracted from them is very dependent on the iso-level
chosen for extraction as well as the distance h used in the summation [APKG07].
Figure 2.1: Surfacing using metaballs. When two particles are brought together,
the combined influence will create a smooth blending (middle), but the technique
is bad at representing flat surfaces (right).
9
Metaballs are also bad at representing flat surfaces, see rightmost image of Fig-
ure 2.1 where the underlying spherical nature of the particle representation is
clearly exposed. Adams et al. [APKG07] also note that disturbing temporal
discontinuities appear when adding or removing particles.
2.2.2 Surface splatting
Another early technique, introduced by Reeves [Ree83] involves rendering par-
ticles used to model fuzzy objects by additively blending them onto the image
plane. If there are enough particles, the individual particles will blend together
and create a smooth surface. The downside of this is that the technique re-
quires a large amount of particles to yield good results. The method also suffers
from problems with single particles being visible at the edges of the surface
[RB08]. This technique, also called splatting, is also used by Sims [Sim90]
where it is implemented to render waterfalls and snow storms etc. In more
recent work, billboards1 are used for surfacing data from SPH simulations by
Muller [MCG03], but in subsequent papers this approach seems to have been
abandoned [MSKG05].
Though it is easy to implement and gives plausible results for particle sim-
ulations [MCG03], it does not seem to have gained much traction in visual
effects. Instead, the method has had more success in the fields of visualization
of scanned point data and real-time applications [ZPvBG01][BHZK05][ALD06].
2.2.3 Level set based methods
Premoze et al. [PTB+03] introduced a technique based on the level set methods
[OS88] to construct surfaces for their particle fluid simulations. The level set
method is used to track the surface by first creating a force field from the particle
distribution for the first frame and then using a partial differential equation to
iteratively evolve the surface until it converges. For subsequent frames, the field
from the previous frame is used for initialization, allowing for continuity for the
surface between frames.
1Camera aligned 2D textures.
10
The downside with this method is that is seems to be very slow compared to
other surface reconstruction algorithms. The authors also note that the method
has difficulties with creating sharp boundaries when colliding with objects or
other fluids.
2.2.4 Beyond metaballs
As described in section 2.2.1, metaballs create a field by summing the influ-
ence of nearby particles, explained by Equation 2.1. A problem with the early
smoothing kernels is that if many particles are grouped closely together, the
field will have very high values in that area, creating a large, unnatural blob.
In 2003, Muller et al. [MCG03] used the density from their SPH simulation to
normalize the particle contribution to the field, see Equation 2.4.
φ(x) =∑i
mi1
ρiW (x− xi, hi) (2.4)
Noting that this approach solves ”some scaling issues but does not signif-
icantly reduce the bumps on what should be flat surfaces”, Zhu and Bridson
[ZB05] proposed a different field creation method. It is based on the principle
that the influence field should be represented by the signed distance field for
lone particles. Given a single particle position x0 with radius r0, the resulting
field for a lone particle is:
φ(x) = ‖x− x0‖ − r0 (2.5)
With this as a base, they formulate a generalized solution for an arbitrary
number of particles by using a weighted average of the positions and radii:
φ(x) = ‖x− a(x)‖ − r(x) (2.6)
a(x) =∑i
w(x− xi, hi)xi/∑i
w(x− xi, hi) (2.7)
r(x) =∑i
w(x− xi, hi)ri/∑i
w(x− xi, hi) (2.8)
11
where w is a smooth decaying weight function with finite influence radius hi.
It is easy to see that for particles with no neighbors within hi, the field is reduced
to the field of Equation 2.5. Because the approach uses average positions in the
particles, spurious blobs may appear in concave areas where the actual surface
positon does not correspond to the position given by Equation 2.6. Even so,
the authors note a big improvement in comparison to previous techniques.
Figure 2.2: Comparison between metaballs (red line), Zhu et. al [ZB05] (green
curve) and Adams et al. [APKG07] (blue line). Image from [APKG07].
Adams et. al [APKG07] improved the method of Zhu et al. [ZB05] by
tracking the particle-surface distance for each particle by using a signed distance
field. They used this field in each time step to adjust the per-particle distance
to the surface. By making use of this distance instead of the radius in Equation
2.8 and the weight function in Equation 2.9, the authors note an improvement
as can be seen in Figure 2.2.
w(r, h) =
(1− (‖r‖/h)2)3 if ‖r‖ < h
0 otherwise(2.9)
2.2.5 Anisotropic kernels
Basing their work on Muller et al. [MCG03] (Section 2.2.4), Yu and Turk
[YT] note two fundamental problems which make the surfaces of the original
approach look bumpy. The first problem is the nature of Lagrangian methods,
which makes the distribution of particles very irregular. Secondly, the authors
note that the spherical shape of the kernels used is not good for representing
the particle density distribution near a surface.
To solve the problem of irregularity, the authors suggest the use of a smooth-
ing step of the particle positions before the surface reconstruction. The smoothed
12
particle centers are described by Equation 2.10, where w is a weighting function
with finite support which determines the influence from nearby particles and λ
a constant such that 0 < λ < 1.
x′i = (1− λ)xi + λ∑j
wijxj/∑j
wij (2.10)
This equation can be solved for all positions simultaneously using a sparse
matrix system and using an iterative solver which will be discussed more in
section 2.4.
To solve the second problem Yu and Turk [YT] proposed a generalization
of the smoothing kernels used by techniques in the previous sections. With the
exception of Premoze et al.[PTB+03], who use the tangential and perpendicular
velocity of the particles to scale their kernel, most other authors use spherically
shaped kernels. Instead of using a constant support radius h, as in Equation
2.3, Yu and Turk propose that a positive definite matrix G should be used.
This matrix is used to rotate and transform r to create a kernel that better
represents the density near a particle. The matrix G is determined by perform-
ing a weighted version of Principal Component Analysis [KC03] to the particle
positions within a neighborhood. This method constructs a weighted covariance
matrix, C, and then performs eigendecomposition where the eigenvectors gives
the principal axes and the eigenvalues gives the scaling factors with which to
scale the kernel.
Figure 2.3: Comparison between isotropic kernels and the kernel description
presented by Yu and Turk. Figure from [YT].
13
The result of this is that the kernels that lie near a surface are flattened
in the normal direction and stretched in the tangential direction of the surface
while lone particles and particles within the volume will be spherical due to
the same distribution of density in every direction. A comparison between this
technique and Zhu et al. can be seen in Figure 2.3.
2.3 Surface extraction
The techniques described in the previous sections that are building a field, φ,
using particles need to use some kind of algorithm for turning the field into a
polygon mesh. The most popular way of doing this is to create an iso-surface
from a constant value in the field, which can be done in a number of different
variations.
2.3.1 Marching Cubes
In 1987, Lorensen et. al [LC87] introduced the Marching Cubes algorithm. The
authors divide the field, φ, into cubes, where each corner is assigned to be either
inside or outside the surface based on the iso-value selected. They then conclude
that a polygon can only pass through the cube in a finite number of ways. For
a cube, this number is 28 = 256, which can be reduced to 15 cases by using
symmetry. Three of these cases can be seen in Figure 2.4. By using a lookup-
table it is possible to quickly get the topological structure of a cube given the
inside/outside state of each corner. The final polygon can then be obtained by
creating vertices at interpolated positions along the cube’s edges. By doing this
(”marching”) for every cube in the field, a mesh can be constructed from the
resulting polygons.
The problem with the original approach is the possibility of ambiguities
in the cases causing non-manifold geometry [D88]. Solutions for this problem
has been introduced by, among others, [MSS94] [Che95] and [LLWVT03], who
introduces complementary cases to the original 15 to solve the ambiguities.
14
Figure 2.4: Three of the fifteen base cases for Marching Cubes with the in-
side/outside state of each corner visualized.
2.3.2 Marching Tetrahedra
Another method solving the ambiguity of Marching Cubes was introduced by
Bloomenthal [Blo88] who proposed the use of tetrahedra instead of cubes. In
addition to removing ambiguities, there are only three unique ways in which
a polygon can pass through a tetrahedron, depicted in Figure 2.5, making im-
plementation even easier. As noted by [SML], though, this technique creates
surfaces with more triangles than Marching Cubes and artificial bumps in the
surface may appear due to the choice of orientation of the tetrahedra.
Figure 2.5: Shows the three ways that a polygon can pass through a tetrahedron.
2.3.3 Marching Tiles
In his thesis, Williams [Wil09] noted that any technique based on marching
structures with dihedral angles greater or equal to 90◦, such as Marching Cubes
or Marching Tetrahedron, can produce vertices with valence1 as low as four.
Such vertices represent poor sampling of the underlying domain and can also
1Number of incident edges on a vertex.
15
cause problems when applying numerical operators such as smoothing on the
surface.
To solve this problem, Williams introduce a new space filling tile consisting
of 46 different tetrahedra. This tile, as seen in Figure 2.6, is constructed such
that each edge have at least five tetrahedra incident on it, thus guaranteeing a
valence of at least five. The tile can be marched in the same way as the other
techniques, but some more lookups need to be done in order to get the correct
tetrahedron for a given place in the field. The structure of the tile and how to
construct it can be found in [Wil09].
Figure 2.6: The space-filling tile used by Williams from two different angles. A
lookup table is used to find which tetrahedron to march depending on location
in the field.
2.4 Surface smoothing
Mesh smoothing has been an area of active research for a long time and has
many applications such as denoising data and design of high-quality surfaces
[BPK+07]. A popular method for smoothing meshes is called Laplacian smooth-
ing, where the vertices of a mesh are gradually moved in the direction of the
Laplacian of each vertex. Depending on the approximation used for the Lapla-
cian, different results can be achieved. [Bra04]
The concept of smoothing has been used in the creation of surfaces from
particles before. For example, Houdini [Sof10] can use a Gaussian filter to
smooth the field before creating the initial surface and also smooth the resulting
16
mesh. While such techniques do create a smoother surface, the problem is that
pre-surface filtering may create unwanted artifacts as seen in Figure 2.7. Both
filtering and post-surface smoothing also suffers from the lack of connection
to the underlying particle representation and is thus not able to distinguish
between noise and actual small-scale structures.
Figure 2.7: Shows a surface created with the Houdini surfacer, with the under-
lying particles overlayed. Note that the surface does not coat all particles and
that there are artifacts in the form of polygons where there are no particles.
In his thesis, Williams [Wil09] presents an approach for creating smooth and
flat surfaces. Instead of using complex kernels to create the initial influence field
field, a distance field is used where each voxel holds the distance to the nearest
particle. The surface is created at router using Marching Tiles (Section 2.3.3).
The surface is then iteratively smoothed while applying constraints to make
sure that the surface stays true to the underlying particles. The basic idea of
the method can be seen in Figure 2.8.
The method poses the smoothing as a problem of constrained nonlinear opti-
mization and two different quadratic objective functions are used for measuring
the smoothness of the surface.
17
Figure 2.8: The vertices of the constructed surface (red line) will be located
between router and rinner when the algorithm is finished. Figure from [Wil09].
2.4.1 Unweighted graph Laplacian
The first smoothing step used by Williams is based on an approximation of the
Laplacian that uses the inter-vertex connectivity of the mesh, introduced by
[Tau95]. The discrete Laplacian of a vertex xi is approximated by:
∇2xi =∑
j∈N1(i)
Wijxj (2.11)
where N1(i) is the inclusive 1-ring neighborhood of vertex i; that is, all
the vertices that share an edge with vertex i plus the vertex itself. Wij is set
to −1 and Wii is set to the number of elements in N1(i), creating a weighted
connectivity graph for all the vertices. It is now possible to form
∇2x = Ax (2.12)
where x is a Nx1 vector with all the vertices in the mesh and A is a NxN
sparse matrix containing the weights for each of the vertices. Iteratively mini-
mizing xTAx will now result in each vertex moving towards the center of all its
neighbors, making the triangles shaped improve.
18
2.4.2 Weighted surface bilaplacian
In the second smoothing step, Williams minimizes xTBx, where B is a quadratic
approximation of the surface Bilaplacian:
B = WTD−1W (2.13)
In Equation 2.13, W is a NxN matrix, N being the number of vertices in the
mesh, where an element Wij is set to the sum of the cotangent of the opposite
angles for the two triangles incident on the edge connecting vertex i and vertex
j, see Figure 2.9 (a) [DC98]. D is a diagonal matrix where element Dii is the
area of all triangles such that vertex i is a part of that triangle, see Figure 2.9
(b). Minimizing the objective function will improve the curvature of the mesh,
creating a better looking surface.
Figure 2.9: (a) Shows the angles used for element Wij . (b) Shows all areas
incident on a vertex xi.
2.4.3 Solvers
When the A and B matrices have been retrieved it is possible to set up and
solve the smoothing optimization problem as a linear system. This system
can then be solved by using for example the Gauss-Seidel iterative method or
Successive Over-relaxation (used by Williams [Wil09]). Other researches have
19
also used preconditioned bi-conjugate gradient for similar problems with good
results [DC98].
The constraints described below do make the optimization problem non-
linear, but they are handled separately and are not encoded into the matrices,
thus it is still possible to use linear solvers.
2.4.4 Constraints
The problem with just smoothing is that it can cause significant shrinkage of the
mesh. If enough iterations are made, the mesh will collapse to the barycenter of
the vertices. Williams solves this by applying a constraint to the vertices after
every smoothing step. By enforcing Equation 2.14 where x is a vertex and Pi a
particle, the resulting surface is not only a good representation of the underlying
particles, the author also notes that given particles arranged on a grid of size d,
a flat surface will be produced if Equation 2.15 is fulfilled.
rinner ≤ mini‖x−Pi‖ ≤ router (2.14)
r2outer >d2
2+ r2inner (2.15)
20
Chapter 3
GPU programming and
CUDA
Performance has always been a driving factor when hardware vendors create new
products, whether it is the transfer speed of USB ports, the response time of a
touch screen or the number of pixels in a digital camera. The clock frequency
of the central processing unit (CPU) of a computer is no different and there has
been an exponential increase in performance over the past decades.
The CPU is constructed for running a single thread of sequential instruc-
tions and works well for many tasks, but this is a major drawback when having
multiple tasks that can run independently of each other, e.g. heavy calcula-
tions. Although multi-core CPUs have been around since 2004, the underlying
hardware is still made for sequential computations. This is probably why re-
searchers without access to massively parallel supercomputers started looking
at General Purpose computing on the Graphical Processing Unit (GPGPU) for
doing their computations. The GPU is very different from the CPU in that it
is made with the intention of performing tasks in parallel, i.e. rendering pixels
on a screen, and can be a great deal faster for computations than the CPU if
the computations can be parallelized.
21
Before the introduction of the Geforce 8 GPU from NVIDIA, the only pos-
sibility of GPGPU was to make use of the graphic APIs such as OpenGL or
DirectX. This made GPGPU both hard to master and also put restrictions on
the type of calculations you could make due to the limitations of the APIs. In
November 2006, together with the Geforce 8 card, NVIDIA also introduced the
CUDA computing architecture which was made in order to solve these limita-
tions. Since then, other APIs have been released, most notably OpenCL which
is a cross-platform API for general GPU programming.
The GPU programming work of this thesis is done exclusively in CUDA,
and therefore the remainder of this chapter will focus on the architecture and
programming model of CUDA.
3.1 CUDA
CUDA, or Compute Unified Device Architecture, is as hinted above a general
purpose hardware and software architecture aimed at massively parallel and
computationally intensive applications.
Developed by NVIDIA, at the time of writing, CUDA is only supported by
GPUs from NVIDIA, such as the Geforce Series. These GPUs are viewed as
devices with additional processors to the main CPU, capable of executing tens
or hundreds of thousands of threads in parallel.
3.2 Architecture
As the underlying architecture is constructed for parallel programming, it is
necessary to have knowledge about it in order to fully utilize its potential.
Every CUDA enabled GPU contain a number of streaming multiprocessors,
each containing a number of cores. The multiprocessors have a Single Instruc-
tion, Multiple Data architecture (SIMD). This means that at a given time,
every core in the streaming multiprocessor executes the same instruction, but
on different data. [Cor10c]
22
Because not all code is suitable for running in parallel, a CUDA program
will typically consist of passages run on either the CPU (the host from now on)
or the GPU (the device from now on). The passages with no data parallelism
are implemented on the host using normal C or C++ code and the sections that
exhibit large amount of data parallelism are implemented using device code (see
Section 3.4).
Figure 3.1: An example of a grid of blocks. Image from NVIDIA CUDA Pro-
gramming Guide [Cor10c].
When a device function, called a kernel, is called, hundreds to tens of thou-
sands of lightweight threads are launched to operate on the data. These threads
are put into one of possibly many blocks, which in term form a two dimensional
grid. The size of the grid and the blocks can be specified by the programmer.
The grid and block indices are also known to each thread, making it possible
to form them to fit the data they operate on. Figure 3.1 shows an example of
a grid of blocks. The first kernel launches a grid of size (3,2) where each block
contains 15 threads (5,3).
23
In order to allow for future changes in the hardware such as adding more
cores, CUDA is designed for transparent scalability. This makes it possible to
create applications that will run on future generations of GPUs without having
to change the code because the architecture supports such changes.
3.3 Memory model
There are several different memory types available to the CUDA programmer.
Each of these memories has different size, scope and lifetime. Figure 3.2 shows
the different memories and their relationships.
• Global memory is the main memory of the GPU and is the only one giving
read and write access to both the host and the device. It is accessible
from all threads but it is not cached, making access times high even if a
coalesced access pattern is followed.
• Constant memory is like global memory but it can only be written to from
the host. It is also cached, making it read from the slow device memory
only on a cache miss.
• Texture memory is a cached memory accessible to all threads with hard-
ware support for special operations such as linear and higher order y7interpolation.
• Registers are extremely fast device memories used to store local variables.
They are local to a thread and the memory will be released once the thread
is done executing.
• Local memory is a part of the global memory, but the memory is still local
to a thread. Since the on-chip memory is limited, the use of too many
registers will cause them to be placed in the local memory instead.
• Shared memory is a very fast on-chip memory which is shared between the
threads of a block, allowing for collaboration without having to resort to
the slow global memory. The lifetime of the shared memory is the lifetime
of a kernel.
24
Figure 3.2: A diagram over the CUDA memories. Image from NVIDIA CUDA
Programming Guide [Cor10c].
3.4 Programming interface
There are currently two Application Programmeing Interfaces (APIs) for pro-
gramming CUDA.
The first one is the CUDA Driver API, which is a low-level interface that
gives good control over the device and has no dependency on the runtime library.
However, the driver API is known to have more verbose code and it is harder
to debug than C for CUDA, which is the second API. [Cor10a]
C for CUDA is a higher level interface implemented on top of the driver API.
It is built to be as an extension to normal C, exposing new syntax for creating
and launching kernels etc. It also contains a number of functions for data
transfer to and from the GPU, thread synchronization and providing OpenGL
interoperability to name a few.
The GPU implementation for this thesis is all done using C for CUDA.
25
3.4.1 Optimizations
Simply getting a CUDA program to compile is not enough to get great perfor-
mance. There are a number of things that need to be considered in order to
achieve fast programs.
• The highest priority should always be to maximize the parts of the pro-
gram that can run in parallel. By choosing correct algorithms and execu-
tion configuration for each kernel launch (maximizing the resource usage),
extreme speedups can be achieved.
• Memory optimization should be done by first minimizing the data trans-
fered between the host and the device. Such transfers have low bandwidth
and are very slow compared to transfers between other memories. Access
of global memory should always be coalesced and access of global memory
from kernels should also be as low as possible, letting kernels within a
block use the shared memory instead.
• Control flow instructions such as if- and for-statements can have a big
impact on performance if the threads within a warp1 are not running
the same code. If one thread in a warp takes a different route in an if-
statement which takes longer time to execute, all the other threads in the
same warp will wait for that thread to finish. By grouping threads or the
data wisely, such clogs can be avoided.
• Instruction optimization can be made by using alternative versions of
heavy arithmetic functions optimized for speed. These functions often
has a tradeoff when it comes to precision though, so they should only be
used when they do not affect the quality of the result.
More in-depth information about CUDA and many more optimization strate-
gies can be found in the CUDA Programming Guide [Cor10c], the Best Practices
Guide [Cor10a], or in the two introductory books Programming Massively Par-
1A logical grouping of threads
26
allel Processors by Kirk and Hwu [KmWH10] or CUDA by Example by Sanders
and Kandrot [SK10].
27
Chapter 4
Implementation
The implementation made in this thesis closely follows the work of Williams
[Wil09], which seemed to yield very good results while at the same time having
many passages allowing for massively parallel operations on the GPU. A CPU
version of the surfacer was constructed as well. This was done because not all
artists had good enough graphics cards to handle huge amount of data and also
because the surface reconstruction should be able to run on the rendering farm,
currently only consisting of CPUs. Though different algorithms are used for
some of the operations, the end result should be the same.
Figure 4.1 shows an overview of the data flow and the different operations
performed from particles to a final surface for a normal run of the algorithm. The
following sections will explain these steps in more depth after a data structure
for fast searching is described.
4.1 Uniform hashed grid
In the field creation, the constraint and the attribute transfer steps of Figure
4.1, a spatial search is needed to quickly find particles near a certain point, be
it a position in the field or a vertex on the surface.
For this, an implementation based on the work of Le Grand [LG07] is used.
In this method space is divided into a fixed sized grid of size (dx, dy, dz) with
28
Figure 4.1: All the steps in the implementation of the surfacing algorithm.
cells of equal size, (cx, cy, cz). Then a hash value according to Equation 4.1
is computed for every particle position, P, which is stored together with the
particle IDs. The particles are then put into a one-dimensional array which is
sorted using a parallel radix sort [SHG09] such that all particles within a cell
are adjacent to each other. Two additional arrays are then created to store the
start and end indices of each cell in the big array, allowing for fast lookup of
nearby particles for positions that hash to the same value.
Phash = mdxdy + ldx + k (4.1)
k = bPx
cxc mod dx l = bPy
cyc mod dy m = bPz
czc mod dz (4.2)
By choosing an appropriate size for the cells, the closest particle to a point
can be found by looking in the cell that the particle position is hashed to, or in
one of the adjacent cells.
29
The uniform hashed grid is fast, light-weight in terms of memory and easy
to implement. A sample implementation of this structure can be found in the
CUDA SDK [SG08], though it is heavily coupled with the specific implementa-
tion.
4.2 Distance field
The CPU version creates the distance field by rasterizing the particles on the
underlying field and then checking nearby voxels inside the influence radius for
their distance value and updating them if necessary. Since it is not required to
store distance in voxels that is not close to a surface, a sparse field from the
Field3D library [Wre10] was used for storage.
The rasterization process causes race conditions when run in parallel if
atomic operations1 are not used. These operations were not widely supported
on the graphic cards installed at Imageworks, so a different technique was used
for the GPU version. A sparse data structure was constructed to reduce the
memory footprint, much like the sparse field above. The first step was to cal-
culate which of the sparse blocks that would need allocation. Blocks that were
too far from any particles to contribute to the surface could simply be skipped.
Then for all the allocated blocks, each voxel position was visited to search for
the distance to the nearest particle by using the uniform hash grid. The size
of the hashed grid was such that if no particle was found within the cell or its
neighbors, the distance could safely be set to ∞.
4.3 Mesh creation
For the mesh creation, Marching Tiles (Section 2.3.3) was implemented. Im-
ageworks already had an implementation of Marching Cubes both for the CPU
and GPU, so they were integrated into the tool as well.
Marching Tiles was done on the GPU by first computing which of the tetra-
hedra in the tiles would actually contain triangles. This was done so there would
1A set of intructions that appear to be run instantaneously to the rest of the system.
30
be no need to march through empty tetrahedra and also to know how many tri-
angles the final mesh would have so proper allocations could be made. Then the
actual marching of the tetrahedra was made and all the edges in the field that
would create a vertex was saved together with the two end values. The reason
for this is that simply creating the triangles directly would create just triangles
and not a mesh. Because of this it is necessary to weld the vertices or in this
case the edges together. The benefit with welding edges instead of the final
vertex positions is that the edges have discrete coordinates while the vertices
have floating point coordinates which may be subject to rounding errors due to
interpolation direction etc. The edge weld was constructed by modifying the
algorithm from [HB10] and then the final vertex values were retrieved, creating
a mesh with correct connectivity information.
4.4 Smoothing
As can be seen in Figure 4.1, the smoothing is first done for xTAx and then
two times for xTBx, one with constraints and the second without. The first two
steps move the vertices to good positions, while the unconstrained step improves
the smoothness of the surface normals while not moving the vertices too much
[Wil09]. The result of different number of smoothing steps minimizing xTAx
can be seen in Figure 4.2.
The matrices processed in the implementation are all very sparse. Each row
in a matrix will have as many elements as there are vertices in the mesh but
only an average of seven elements will be non-zero. For memory reasons, the
matrices in the implementation were all represented using a sparse matrix stor-
age scheme called Compressed Row Storage. While this method makes element
access slower, the memory consumption is extremely small in comparison to a
dense matrix.
31
Figure 4.2: Different number of smoothing steps for xTAx with constraints.
(top) Initial surface. (middle) 2 steps, (bottom) 6 steps
4.4.1 Finding matrices
The process of computing the A and B matrices were implemented on the CPU
because of the serial nature of the procedure and a lack of time to find appro-
priate parallel versions. A is found by creating an array of associative arrays
and looping over the connectivity list of the mesh, adding the vertices of each
triangle to the appropriate associative array. The sparse matrix is then con-
structed by extracting the values from the arrays. The matrix B is constructed
in a similar fashion.
4.4.2 Solvers
For the actual solving of the linear systems, a Jacobi solver adapted for sparse
data was used. Even though this method might converge slower than Gauss-
Seidel or Successive Over-Relaxation, the fact that it is very easy to parallelize
32
and that the parallel version is much faster makes up for this, see Section 5.5.
4.5 Constraints
The shrinking effect of the smoothing which can be seen as a negative effect in
some cases is what makes surfaces flat in this implementation. The constraints
make sure that the vertices are allowed to move to better positions while still
being true to the particle positions. The constraints are enforced after each
smoothing step by simply using a hashed grid to look up the closest particle to
each vertex and move the vertex along the radial vector if found to be inside or
outside the stipulated radii.
4.6 Erosion
In an attempt to be able to represent very thin structures without using huge
number of particles with a small radii, an optional erosion step was added,
where each vertex is moved along its negative normal direction. The result of
this operation can be seen in Figure 4.3.
While easy to implement and straight forward, this method very easily cre-
ates self-intersecting triangles if a large value is used. A surface that looks good
in one frame might also have many self-intersections in the next one if large
values are used.
Figure 4.3: The effect of eroding a surface along the vertex normals.
33
4.7 Attribute transfer
As a last optional step, the ability to transfer attributes from the particles to
the vertices of the mesh was added. Such attributes can for example be color,
see Figure 5.3, or velocity which can be used for motion blur. This was done
by specifying an influence radius for the vertices and then use that as the size
of the spatial hash to find all relevant particles within the radius and using a
weighted average to get the final attribute value.
4.8 Houdini
The final program was designed to be independent on any particular software
package taking only a list of particles and a structure containing the surfacing
parameters. In the prototype made, the functionality of the surfacer was ex-
posed by constructing a Surface Operator (SOP) for Houdini using the Houdini
Development Kit. A screenshot of the SOP can be seen in Figure 4.4. The
parameters seen in the figure masks many of the real ones for ease of use for the
artist.
Figure 4.4: A screenshot of the Houdini operator created for the particle surface
reconstruction.
34
Chapter 5
Results
5.1 Computer specifications
All the work in this thesis, including the timing experiments, was done on an
octacore 2.5GHz computer with 24Gb of RAM and a Quadro FX 3700 graphics
card with 512Mb memory. Note that the CPU version was not threaded, thus
only making use of one core.
5.2 Example: Fountain
In Figure 5.1, a sequence from a SPH simulation of a fountain pouring a stream
of water can be seen. The simulation had approximately 120,000 particles and
was surfaced on a 325x600x100 grid. The surfacing took about two seconds on
the GPU, and 30 seconds if done on the CPU in production quality.
35
Figure 5.1: Frames from the surfacing of a fountain simulation.
36
5.3 Example: Splash
Figure 5.2 shows the surfacing of a splash where an object collides with a big
pool of particles. The simulation as well as the surfacing was done in three
different passes in order to add more small detail to the final sequence. Note
that this makes the small scale particles unable to interact with the pool in the
simulation.
Figure 5.2: An object crashes into a big pool of particles.
37
5.4 Example: Color and velocity transfer
The ability to transfer attributes such as color and velocity can be seen in Figure
5.3. A cube acts as source for all the particles which are colored based on their
coordinates.
Figure 5.3: Example of color transfer from particles onto geometry.
38
5.5 Performance
The tests in the following sections were run ten times each and the result is
the average of these runs. The tests were run on the same dataset, which was
475,000 particles that was sampled and surfaced on a 500x650x350 field. Figure
5.4 shows a bar chart over the time for the different parts of the surfacing, both
for the CPU and the GPU version. Figure 5.5 shows the percentages of the
processing times for the GPU version. Note that these results are only an ex-
ample and that results may vary quite much depending on particle distribution
and field size. However, it can be seen that the GPU version is significantly
faster where full utilization of the GPU can be used. It can also be seen that
overhead and the sections that are done on the CPU takes up more than 70%
of the computation time for the GPU version.
Figure 5.4: Illustrates the execution time for different parts of the program on
both the CPU and the GPU.
39
Figure 5.5: Shows how the processing time is divided between the different parts
of the GPU program.
40
Chapter 6
Conclusion
The goal of this thesis was to evaluate the field of reconstructing surfaces from
particle data and to create a prototype for this using the GPU for integration
into Houdini. The technique implemented should be faster than the current
technique and also be easier for artists to work with. Furthermore it should be
easier to represent flat surfaces without introducing artifacts.
Because the method presented in this thesis involves more steps and is more
computationally expensive than the native Houdini version (which basically
performs the two first steps of Figure 4.1) it may not be fair to compare them
straight off. The general feeling when working with the GPU version is that it
is still faster than the native application. When not doing any smoothing or
constraints, the GPU version is much faster.
The usability of the plugin created is not easy to estimate because the pro-
totype was not rolled out to be tested by artist until one week prior to the
completion of the thesis. The feedback receieved, though, seemed positive and
the SOP does hide a lot of unnecessary options to the end user, instead exposing
a combination of them as for example ”Smoothness” or ”Mesh detail”.
The technique used for surfacing and smoothing based on the work of Williams
[Wil09] creates a final surface that is globally smooth and also flat under certain
conditions (Equation 2.15). The technique also creates a surface that stays true
to the particles positions. The smoothing can however create collapsing features
41
under certain conditions when the vertices snaps to the ”wrong” particles due
to large smoothing step sizes. The method is also sensitive to changes in the
field, something that becomes a problem when surfacing particles having almost
no velocity. Many of the simulations used for testing had particles that were
supposed to be settled but still oscillated a little bit, which caused the resulting
surface to flicker due to differences in the field from one frame to the next. This
becomes extra evident when using reflective materials for the surface.
6.1 Future work
The paper by Yu et al. [YT] has some interesting ideas and the results are
visually very pleasing. The implementation of anisotropic kernels should be the
first priority for further work. Yu et al. also acknowledges that their solution
would benefit from the work of Williams [Wil09], and a combined solution would
be good to try. The changes needed would be the creation of the field as well
as the constraint enforcement, where a pure distance to the nearest particle can
no longer be used. Instead the constraints need to be based on the individual
elliptical kernel for each particle.
Another thing that the evaluating artists seemed to want was the possibility
to smooth parts of the mesh differently from others. This could easily be done
by either creating a field that scales the factors in the matrices or it could
automatically be done based on properties such as velocity. Using velocity to
scale the elements in the matrices, it would be possible to smooth areas where
particles are not moving heavier and smooth less where a lot is going on.
The smoothing of the particle positions mentioned in Yu et al. [YT] should
help with the temporal flickering. This smoothing could possibly also be weighted
by velocity, letting areas of particles with low velocity be more dependent on
neighbors than high velocity particles, where flickering does not pose a problem.
As can be seen in section 5.5 there are parts of the program that is over 100
times faster than the corresponding CPU part, but the overall performance is
clogged by bottlenecks. One of the big bottlenecks is the computations that are
42
still done on the CPU.
One possible optimization could be to reuse the B matrix in the second step
instead of recomputing it, which will gain time but possibly not affect the final
surface negatively. Multithreading the CPU parts would also gain the GPU
version, but if possible it would be better to try to move all of the computations
to the GPU. This would not only give speedup in the computations, the current
overhead in moving data back and forth would be removed as well. CUDA is
also moving very fast and new libraries and features are coming very rapidly,
many of which might be useful, for example CUSPARSE [Cor10b] which is a
NVIDIA-developed library for handling sparse matrices added in September
2010.
43
Acknowledgements
I would like to thank my supervisor Magnus Wrenninge for all his help and
invaluable input.
I would also like to thank the talented people at Sony Pictures Imageworks
with whom I have had the fortune to interact, especially Chris Allen, Sosh
Mirsepassi and Viktor Lundqvist.
Finally I would like to thank Jonas Unger for helping me getting into the
IPAX program and Swedbank-Sparbanksstiftelsen Alfa as well as Professor An-
ders Ynnerman’s Foundation for the scholarships that I received.
44
Bibliography
[ALD06] Bart Adams, Toon Lenaerts, and Philip Dutr. Particle Splatting:
Interactive Rendering of Particle-Based Simulation Data, 2006.
[APKG07] Bart Adams, Mark Pauly, Richard Keiser, and Leonidas J. Guibas.
Adaptively sampled particle fluids. In SIGGRAPH ’07: ACM SIG-
GRAPH 2007 papers, page 48, New York, NY, USA, 2007. ACM.
[BHZK05] Mario Botsch, Alexander Hornung, Matthias Zwicker, and Leif
Kobbelt. High-quality surface splatting on today’s GPUs. Proceed-
ings Eurographics/IEEE VGTC Symposium Point-Based Graphics,
0:17–141, 2005.
[Bli82] James F. Blinn. A Generalization of Algebraic Surface Drawing.
ACM Trans. Graph., 1(3):235–256, 1982.
[Blo88] Jules Bloomenthal. Polygonization of implicit surfaces. Computer
Aided Geometric Design, 5(4):341 – 355, 1988.
[BPK+07] Mario Botsch, Mark Pauly, Leif Kobbelt, Pierre Alliez, Bruno
Levy, Stephan Bischoff, and Christian Rossl. Geometric model-
ing based on polygonal meshes. In SIGGRAPH ’07: ACM SIG-
GRAPH 2007 courses, page 1, New York, NY, USA, 2007. ACM.
[Bra04] Nicholas Bray. Notes on mesh smoothing, 2004.
[Bri08] Robert Bridson. Fluid simulation. A K Peters, 2008.
45
[Che95] Evgeni V. Chernyaev. Marching Cubes 33: Construction of Topo-
logically Correct Isosurfaces. Technical report, Institute for High
Energy Physics, 142284, Protvino, Moscow Region, Russia, 1995.
[Cor10a] NVIDIA Corporation. CUDA C Best Practices Guide, Version 3.2,
2010. [Online; accessed 18-September-2010].
[Cor10b] NVIDIA Corporation. CUDA CUSPARSE Library, 2010.
[Cor10c] NVIDIA Corporation. Nvidia cuda c programming guide, version
3.2, 2010. [Online; accessed 18-September-2010].
[D88] Martin J. Durst. Re: additional reference to ”marching cubes”.
SIGGRAPH Comput. Graph., 22(5):243, 1988.
[DC98] Mathieu Desbrun and Marie-Paule Cani. Active Implicit Sur-
face for Animation. In Wayne A. Davis, Kellogg S. Booth, and
Alain Fournier, editors, Graphics Interface 1998, June, 1998, pages
143–150, Vancouver, BC, Canada, June 1998. Canadian Human-
Computer Communications Society. Published under the name
Marie-Paule Cani-Gascuel.
[EFFM02] Douglas Enright, Ronald Fedkiw, Joel Ferziger, and Ian Mitchell.
A hybrid particle level set method for improved interface capturing.
J. Comput. Phys., 183(1):83–116, 2002.
[HB10] Jared Hoberock and Nathan Bell. Thrust: Weld vertices
example. http://code.google.com/p/thrust/source/browse/
examples/, October 2010.
[KC03] Yehuda Koren and Liran Carmel. Visualization of Labeled Data
Using Linear Transformations. Information Visualization, IEEE
Symposium on, 0:16, 2003.
[KmWH10] David B. Kirk and Wen mei W. Hwu. Programming Massively
Parallel Processors: A Hands-on approach. Addison-Wesley Pro-
fessional, 1 edition, 2010.
46
[KVH84] James T. Kajiya and Brian P Von Herzen. Ray tracing volume
densities. SIGGRAPH Comput. Graph., 18(3):165–174, 1984.
[LC87] William E. Lorensen and Harvey E. Cline. Marching cubes: A high
resolution 3D surface construction algorithm. In SIGGRAPH ’87:
Proceedings of the 14th annual conference on Computer graphics
and interactive techniques, pages 163–169, New York, NY, USA,
1987. ACM.
[LG07] Scott Le Grand. GPU Gems 3 - Broad-Phase Collision Detection
with CUDA, chapter 32, pages 697–722. Addison-Wesley, 2007.
[LGF04] Frank Losasso, Frederic Gibou, and Ron Fedkiw. Simulating water
and smoke with an octree data structure. In SIGGRAPH ’04:
ACM SIGGRAPH 2004 Papers, pages 457–462, New York, NY,
USA, 2004. ACM.
[LLWVT03] Thomas Lewiner, Hlio Lopes, Antnio Wilson Vieira, and Geovan
Tavares. Efficient implementation of marching cubes cases with
topological guarantees. Journal of Graphics Tools, 8(2):38366, de-
cember 2003.
[Lun09] Viktor Lundqvist. A smoothed particle hydrodynamic simulation
utilizing the parallel processing capabilites of the GPUs. Master’s
thesis, Linkping University, Department of Science and Technol-
ogy, 2009.
[MCG03] Matthias Muller, David Charypar, and Markus Gross. Particle-
based fluid simulation for interactive applications. In SCA ’03:
Proceedings of the 2003 ACM SIGGRAPH/Eurographics sym-
posium on Computer animation, pages 154–159, Aire-la-Ville,
Switzerland, Switzerland, 2003. Eurographics Association.
[MSKG05] Matthias Muller, Barbara Solenthaler, Richard Keiser, and Markus
Gross. Particle-based fluid-fluid interaction. In SCA ’05: Proceed-
ings of the 2005 ACM SIGGRAPH/Eurographics symposium on
47
Computer animation, pages 237–244, New York, NY, USA, 2005.
ACM.
[MSS94] Claudio Montani, Riccardo Scateni, and Roberto Scopigno. A
modified look-up table for implicit disambiguation of Marching
Cubes. The Visual Computer, 10(6):353–355, 1994.
[OF03] Stanley Osher and Ronald Fedkiw. Level Set Methods and Dynamic
Implicit Surfaces. Springer, 2003.
[OS88] Stanley Osher and James A. Sethian. Fronts propagating with
curvature-dependent speed: Algorithms based on Hamilton-Jacobi
formulations. J. Comput. Phys., 79(1):12–49, 1988.
[Par82] Paramount. Star Trek II: The Wrath of Khan (movie), 1982.
[PTB+03] Simon Premoze, Tolga Tasdizen, James Bigler, Aaron Lefohn, and
Ross T. Whitaker. Particle-Based Simulation of Fluids. Computer
Graphics Forum, 22(3):401–410, 2003.
[RB08] Ilya D. Rosenberg and Ken Birdwell. Real-time particle isosurface
extraction. In I3D ’08: Proceedings of the 2008 symposium on
Interactive 3D graphics and games, pages 35–43, New York, NY,
USA, 2008. ACM.
[Ree83] William T. Reeves. Particle systems—a technique for modeling
a class of fuzzy objects. In SIGGRAPH ’83: Proceedings of the
10th annual conference on Computer graphics and interactive tech-
niques, pages 359–375, New York, NY, USA, 1983. ACM.
[SG08] NVIDIA Corporation Simon Green. CUDA Particles, 2008.
[SHG09] Nadathur Satish, Mark Harris, and Michael Garland. Designing
efficient sorting algorithms for manycore GPUs. In IPDPS ’09:
Proceedings of the 2009 IEEE International Symposium on Par-
allel&Distributed Processing, pages 1–10, Washington, DC, USA,
2009. IEEE Computer Society.
48
[Sim90] Karl Sims. Particle Animation and Rendering Using Data Parallel
Computation. In Computer Graphics, pages 405–413, 1990.
[SK10] Jason Sanders and Edward Kandrot. CUDA by Example: An In-
troduction to General-Purpose GPU Programming. Morgan Kauf-
mann Publishers, 2010.
[SML] Will Schroeder, Ken Martin, and Bill Lorensen. The Visualization
Toolkit, Third Edition. Kitware Inc.
[Sof10] Side Effects Software. Houdini. http://www.sidefx.com/, Octo-
ber 2010.
[Tau95] Gabriel Taubin. A signal processing approach to fair surface design.
In SIGGRAPH ’95: Proceedings of the 22nd annual conference on
Computer graphics and interactive techniques, pages 351–358, New
York, NY, USA, 1995. ACM.
[Wil09] Brent Warren Williams. Fluid Surface Reconstruction from Parti-
cles. Master’s thesis, The University of British Columbia, 2009.
[Wre10] Magnus Wrenninge. Field3D: A file format and data structures for
storing 3D voxel and simulation data, 2010. Open source project
at Sony Pictures Imageworks.
[YT] Jihun Yu and Greg Turk. Reconstructing Surfaces of Particle-
Based Fluids Using Anisotropic Kernels. In 2010 ACM SIG-
GRAPH/Eurographics symposium on Computer animation.
[ZB05] Yongning Zhu and Robert Bridson. Animating sand as a fluid. In
SIGGRAPH ’05: ACM SIGGRAPH 2005 Papers, pages 965–972,
New York, NY, USA, 2005. ACM.
[ZPvBG01] Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus
Gross. Surface splatting. In SIGGRAPH ’01: Proceedings of the
28th annual conference on Computer graphics and interactive tech-
niques, pages 371–378, New York, NY, USA, 2001. ACM.
49