8
Screen-space VPL propagation for real-time indirect lighting ıtor M. Aranha Departament of Computer Science, Federal University of Bahia, Salvador, BA, Brazil Email: [email protected] arcio C. F. Macedo Departament of Computer Science, Federal University of Bahia, Salvador, BA, Brazil Email: [email protected] Antonio L. Apolinario Jr Departament of Computer Science, Federal University of Bahia, Salvador, BA, Brazil Email: [email protected] Abstract—Reproducing indirect lighting in real time for 3D scenes is a computationally intensive task, but its results add realism and visual fidelity to game-like scenarios and applica- tions. Previous attempts to solve this task explored the use of paraboloid shadow maps for hemispherical visibility queries and Virtual Point Lights (VPLs) for light transport simulations. In our work we aimed at reproducing up to two diffuse bounces of light, while maintaining real-time framerates. We propose an extension to clusterization-based methods in order to sample less VPLs per paraboloid shadow map and capture more bounces of light efficiently. We also propose a projection-aware sampling method to improve sampling efficiency. Our experiments show that, with our approach, it is possible to generate two-bounce VPLs in real time even for low-end commodity GPUs, providing a fast method for indirect illumination. I. I NTRODUCTION Reproducing the many interactions of light with the ambient yields realistic and pleasing images while providing visual cues about the materials in a scene. Consequently, it is now commonplace for movies, games or architecture applications to employ global illumination techniques in their products in order to enhance their user experiences. The full light transport phenomenon can be severely de- manding to be recreated computationally under strict time constraints such as the ones typically found in the game in- dustry. To solve that problem many techniques were developed such as Path-Tracing [1] and Photon Mapping [2]. The former devises a path-based formulation of the light travelling through a medium, tracing rays to check for visibility between objects. Photon Mapping consists in estimating the distribution of light energy in the form of photons over a scene and finally com- puting the radiance by gathering the contribution of nearest neighbor photons to obtain the final image. Instant Radiosity [3] is a light transport method that suggests tracing paths from the primary light source through geometry and assigning Virtual Point Lights (VPLs) to its vertices. After rendering the scene from the viewpoints of these VPLs, it is possible to reproduce realistic lighting conditions at interactive rates given that enough VPLs are used. However, this technique requires a high number of rendering passes over geometry, which prevents the algorithm from achieving real-time performance even on the current commodity graphics hardware. Therefore, techniques that speed up the process of solving these visibility queries can be considered essential to algorithms willing to reproduce global illumination effects in real time. Alternatively, pull-push methods [4] to reconstruct many low resolution shadow maps in parallel using the graphics pipeline have been suggested as well, but while interactive rates can be achieved, geometry preprocessing and thousands of VPLs are necessary to reproduce coherent images, imposing constraints to the algorithm. A group of algorithms such as the Deep G-Buffer [5] and Directional Occlusion [6], specialize in using only screen- space information to approximate Global Illumination effects in real time, but while they achieve fast framerates, problems such as flickering and excessive viewer-dependence make them unstable and limit their use cases. Clusterization techniques offer a practical solution to min- imize the number of times a scene must be rasterized. By rendering the geometry for a small subset of Virtual Area Lights only, groups of VPLs geometrically similar can reuse shadow tests and speed-up shading of first bounce indirect illumination. Since VPLs are, in general, point light sources, cube mapping is a common approach to render a scene from the point of view of each VPL, but the fact that it requires six passes over geometry, one for each cube face, makes it prohibitive to real-time scenarios with many dynamic elements on screen. A common solution to reduce the cost of that step to one geometry pass per VPL is to use paraboloid maps [7] rather than cube maps. However, despite being faster than cube maps, paraboloid projections may suffer from visual artifacts as we must perform non-linear calculations while the hardware pipeline is bound to linear interpolations. Another possible shortcoming of a paraboloid map is that it projects an ellipse over a rectangular area on screen and typical screen-space sampling methods rely on unit-box samples to choose texels from 2D images. In this paper we propose Screen-Space VPL Propagation (SSVP), a method based on merging the speed of screen-space algorithms with the stability of world-space approaches to the field of global illumination. By using a pseudo-random sam- pling method for paraboloid shadow maps that approximates the projection of an elliptical paraboloid in a 2D texture, the probability of samples falling under empty or incoherent areas of the projected image is eliminated, consequently allowing for

Screen-space VPL propagation for real-time indirect lighting

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Screen-space VPL propagation for real-time indirect

lighting

Vıtor M. Aranha

Departament of Computer Science,

Federal University of Bahia,

Salvador, BA, Brazil

Email: [email protected]

Marcio C. F. Macedo

Departament of Computer Science,

Federal University of Bahia,

Salvador, BA, Brazil

Email: [email protected]

Antonio L. Apolinario Jr

Departament of Computer Science,

Federal University of Bahia,

Salvador, BA, Brazil

Email: [email protected]

Abstract—Reproducing indirect lighting in real time for 3Dscenes is a computationally intensive task, but its results addrealism and visual fidelity to game-like scenarios and applica-tions. Previous attempts to solve this task explored the use ofparaboloid shadow maps for hemispherical visibility queries andVirtual Point Lights (VPLs) for light transport simulations. Inour work we aimed at reproducing up to two diffuse bouncesof light, while maintaining real-time framerates. We propose anextension to clusterization-based methods in order to sample lessVPLs per paraboloid shadow map and capture more bouncesof light efficiently. We also propose a projection-aware samplingmethod to improve sampling efficiency. Our experiments showthat, with our approach, it is possible to generate two-bounceVPLs in real time even for low-end commodity GPUs, providinga fast method for indirect illumination.

I. INTRODUCTION

Reproducing the many interactions of light with the ambient

yields realistic and pleasing images while providing visual

cues about the materials in a scene. Consequently, it is now

commonplace for movies, games or architecture applications

to employ global illumination techniques in their products in

order to enhance their user experiences.

The full light transport phenomenon can be severely de-

manding to be recreated computationally under strict time

constraints such as the ones typically found in the game in-

dustry. To solve that problem many techniques were developed

such as Path-Tracing [1] and Photon Mapping [2]. The former

devises a path-based formulation of the light travelling through

a medium, tracing rays to check for visibility between objects.

Photon Mapping consists in estimating the distribution of light

energy in the form of photons over a scene and finally com-

puting the radiance by gathering the contribution of nearest

neighbor photons to obtain the final image. Instant Radiosity

[3] is a light transport method that suggests tracing paths

from the primary light source through geometry and assigning

Virtual Point Lights (VPLs) to its vertices. After rendering

the scene from the viewpoints of these VPLs, it is possible to

reproduce realistic lighting conditions at interactive rates given

that enough VPLs are used. However, this technique requires

a high number of rendering passes over geometry, which

prevents the algorithm from achieving real-time performance

even on the current commodity graphics hardware. Therefore,

techniques that speed up the process of solving these visibility

queries can be considered essential to algorithms willing to

reproduce global illumination effects in real time.

Alternatively, pull-push methods [4] to reconstruct many

low resolution shadow maps in parallel using the graphics

pipeline have been suggested as well, but while interactive

rates can be achieved, geometry preprocessing and thousands

of VPLs are necessary to reproduce coherent images, imposing

constraints to the algorithm.

A group of algorithms such as the Deep G-Buffer [5] and

Directional Occlusion [6], specialize in using only screen-

space information to approximate Global Illumination effects

in real time, but while they achieve fast framerates, problems

such as flickering and excessive viewer-dependence make

them unstable and limit their use cases.

Clusterization techniques offer a practical solution to min-

imize the number of times a scene must be rasterized. By

rendering the geometry for a small subset of Virtual Area

Lights only, groups of VPLs geometrically similar can reuse

shadow tests and speed-up shading of first bounce indirect

illumination. Since VPLs are, in general, point light sources,

cube mapping is a common approach to render a scene from

the point of view of each VPL, but the fact that it requires

six passes over geometry, one for each cube face, makes it

prohibitive to real-time scenarios with many dynamic elements

on screen. A common solution to reduce the cost of that step

to one geometry pass per VPL is to use paraboloid maps [7]

rather than cube maps. However, despite being faster than cube

maps, paraboloid projections may suffer from visual artifacts

as we must perform non-linear calculations while the hardware

pipeline is bound to linear interpolations. Another possible

shortcoming of a paraboloid map is that it projects an ellipse

over a rectangular area on screen and typical screen-space

sampling methods rely on unit-box samples to choose texels

from 2D images.

In this paper we propose Screen-Space VPL Propagation

(SSVP), a method based on merging the speed of screen-space

algorithms with the stability of world-space approaches to the

field of global illumination. By using a pseudo-random sam-

pling method for paraboloid shadow maps that approximates

the projection of an elliptical paraboloid in a 2D texture, the

probability of samples falling under empty or incoherent areas

of the projected image is eliminated, consequently allowing for

plausible real-time propagation of indirect bounces of light in

the scene in fast real-time framerates. The 2D paraboloid maps

are rendered as Reflective Shadow Maps [8] and propagation

is achieved by pseudo-random uniform sampling on these

textures to access position, normals and color data.

The main contributions of our technique are the following:

• A fast way to reproduce up to two bounces of diffuse

indirect lighting for 3D scenes.

• An efficient method of sampling from paraboloid shadow

maps.

II. RELATED WORK

Many algorithms attempt at solving the rendering equation

for a diverse set of effects and time constraints. General

techniques that focus on realistically representing a broad

range of emerging lighting phenomena such as Path Tracing

[1] and Photon Mapping [2] are typically used in offline

rendering scenarios like movies.

To maximize real-time performance we adhere to VPL

techniques, also known by the name of many-light render-

ing. Several techniques for VPL-based rendering have been

proposed in literature. Their popularity rose specially due

to being easy to integrate in traditional GPU pipelines with

good scalability and performance. In this section, we present

a review of the most relevant work related to our approach.

For further reading, we suggest the reader to see [9], a

comprehensive survey on the historical development of many-

light methods.

Reflective Shadow Maps (RSMs) [8] increments shadow

maps by storing additional layers with the position, normal

and radiant flux of each rendered texel. Shading of each

pixel is performed by gathering light contribution from VPLs

sampled from the RSM. In this case, hundreds of VPLs

are needed to provide plausible images. In order to improve

RSM performance, a splatting approach [10] has also been

suggested, though it does not attempt to solve visibility,

leading to undesired light leaks.

Shading with Interleaved Sampling [11] is a common tech-

nique used with many-light methods to alleviate the cost of

shading a pixel with multiple light sources by splitting a G-

Buffer into smaller, lower resolution interleaved buffers. Each

buffer computes contribution from a subset of lights only, and,

in a further step, the buffers are merged again and a denois-

ing filter is applied to remove undesired artifacts. A buffer

translation algorithm was also presented to leverage coalesced

memory access in GPUs and maximize performance.

The pull-push based approach of the Imperfect Shadow

Maps [4] attempts to reconstruct point-based geometry ren-

dered from the points of view of thousands of VPLs into low

resolution buffers. Drawing inspiration from Virtual Spherical

Lights [12], a clustering based approach was developed in [13]

reducing the number of visibility calculations. Although real-

time performance was achieved, lighting was limited to single-

bounce indirect illumination only.

Virtual Gaussian Spherical Lights [14] represent the radiant

intensity of VPLs using spherical gaussians. Built on the

basis of VPLs generated by an RSM, the spherical gaussian

representation can be calculated by mipmapping the shadow

map texture on the GPU. In this representation, VPLs are

extracted by filtered importance sampling RSMs. For high

frequency illumination, such as caustics, a bidirectional RSM

is suggested. This approach supports real-time indirect illu-

mination, but the lighting effect may be overblurred due to

algorithm bias.

Sequential Monte Carlo Instant Radiosity [15] builds a tem-

porally coherent distribution of VPLs with a heuristic sampling

method that preserves certain groups of VPLs between frames.

Light paths traced from the primary light source also take in

account surfaces that are indirectly visible from the camera,

aiming to put VPLs on points that will better influence the final

image. The interactive frametimes of this method however

makes it unfit for dynamic real-time applications.

Screen-Space approaches such as [6] and [16] aim to reuse

rasterized scene data to approximate indirect illumination

effects. Tradeoffs between speed of computation and visual

quality are often the goal of these techniques. Unfortunately,

solving temporal incoherence is a difficult task in this case,

due to its high degree of viewer dependence.

With our technique we propose an improvement over the

clustered visibility based algorithm [13]. That related work

is unable to reproduce more than one indirect light bounce,

while our approach calculates up to a second bounce. We

use a many-light screen-space approach based on the 2D

paraboloid projection of the scene as an RSM. However,

instead of sampling from the unit square, points are generated

in a shape that minimizes out-of-bound texels, closely packing

them together over the projection of the elliptical paraboloid.

SSVP is a view-independent fast approximation to indirect

lighting that aims at a compromise between speed and quality,

while providing temporal coherency.

III. SCREEN-SPACE VPL PROPAGATION

Figure 1 shows an overview of SSVP’s pipeline. We assume

a G-Buffer [17] and an RSM [8] as inputs to our VPL

propagation pipeline (Figure 1a). The main idea behind SSVP

is to generate VPLs up to a second bounce without resorting to

expensive ray tracing. VPLs are created from RSMs (Figure

1b) and clustered to reduce the cost of computing visibility

for the entire set of point lights (Figure 1c). We leverage the

paraboloid projection of a cluster’s visibility in order to sample

new surface points and efficiently generate new bounces of

light (Figure 1d). In the next subsections we explain the details

of each step in SSVP’s pipeline.

A. First-Bounce VPL Creation

The scene is rendered from the point of view of the light

source and position, normals, albedo and depth are stored in

2D textures (Figure 1a). Shadow map and RSM generation are

also decoupled here, this way we can adjust for quality and

performance of each step separately. Some approaches suggest

the importance sampling of VPLs by their contribution to the

Lig

ht

Vie

wC

amer

aV

iew

(a) G-Buffer Rendering (b) First-bounce VPL

Sampling

(c) First-bounce VPL

Clustering

(d) Second-bounce VPL

Sampling

(e) Final Rendering

Fig. 1. An overview of our SSVP pipeline. First, the scene is rendered from the camera’s viewpoint to a G-buffer. Next, we render from the viewpoint ofthe main light source to generate our RSM. In both cases we store depth, world-space position, albedo and normals (a). First-bounce VPLs (red squares) aresampled over the scene from the RSM (b). Afterwards, clusters (green squares) are generated by grouping similar VPLs (c). For each cluster, we generate aparaboloid map and perform a unit-disk sampling to distribute second-bounce VPLs (blue squares) over the scene (d). Shading is performed by gathering thecontribution of the VPLs in the final stage (e).

scene [4], but while visual quality is favored in such tech-

niques, when used in conjunction with VPL clusters, temporal

coherence is harmed by incoherent VPL positions between

frames. Therefore standard uniform sampling is employed with

a low discrepancy sequence, in this work 256 samples from a

2D Hammersley sequence [18] are used. The set of the first

n 2D points of this sequence is defined in Equation 1, where

γ is the Van der Corput sequence. Figure 1b illustrates the

positioning of the first-bounce VPLs on the scene.

(

k

n, γ(k)

)

for k = 0, 1, 2 ... n-1 (1)

B. VPL Clustering

To reduce the number of visibility queries required to

generate the final scene, a traditional K-means algorithm is

used to cluster the VPLs. The number of clusters is a user-

defined input parameter. At the beginning of the first iteration,

a random VPL is chosen as the centroid seed for each cluster.

Then, a VPL is assigned to the cluster whose distance between

its centroid and the respective VPL is smaller than the distance

to other clusters. This process is repeated untill convergence,

when the average position and normal of a centroid does not

change between iterations. In our tests we identified this value

to be a function of the number of clusters used, in our tests

no more than 5 iterations were enough to converge.

Because we would also like to mitigate the temporal in-

coherence between frames, we adopt a strategy of reusing

the previous cluster positions for each successive frame. This

approach also speeds up the stability and convergence of

centroids between K-means iterations.

The traditional way of computing the distance between

a VPL and a cluster centroid is to use only the Euclidean

distance between them. However, this approach is subject to

many artifacts, leading to centroids being completely occluded,

inside geometry and leaking light. This happens because VPLs

can be positioned closely together but pointing in completely

different directions.

To solve this problem, the distance metric must take in

account the similarity between normals, allowing the computa-

tions to capture geometric information such as coplanarity and

main direction, mitigating possible artifacts. For this reason

SSVP uses the metric proposed by Dong et al [13] as we

show in Equation 2.

Dij = w1‖Vi − Cj‖+ w2(ni · nj)+ (2)

Dij is the distance between the i-th VPL to the j-th cluster.

Vi is the position of VPL i and Cj is the position of a

cluster j. ‖Vi − Cj‖ corresponds to the Euclidean distance

between a cluster centroid and a VPL. Weighting factors w1

and w2 adjust the influence of both the Euclidean distance

and the dot product between normals. Favoring one over the

other will lead to centroids being positioned inconveniently,

inside surfaces or floating in the air. For this reason based on

empirical experience we suggest the use of balanced values

with similar or close values, here we used w1 = 0.6 and

w2 = 0.4. Finally (ni·nj) is the dot product between the

normals of a VPL and a cluster respectively,the notation +means we are clamping the cosine to positive values. All the

attributes used in this stage are obtained from our first-bounce

VPLs sampled in the previous stage.

C. Second-Bounce VPL Sampling

Once the clusters are generated, the algorithm is ready to

solve the visibility computations needed to sample second-

bounce VPLs. A hemispherical visiblity map is rendered from

the viewpoint of each cluster centroid (Figure 1d) in this stage.

The advantage of using paraboloid projections, is that the

number of rasterization passes over the geometry are reduced

to 1 per cluster. In comparison, hemicube approaches would

require 5 rasterization passes per cluster, greatly increasing

the cost of sampling. This step is performed with geometry

shaders in SSVP’s pipeline to leverage the use of layered

textures. A resolution of 2562 pixels is sufficient here.

A naive approach to sample texels from paraboloid maps is

to reuse the sampling coordinates from the previous step in

the pipeline. However, that scheme does not take in account

that the 2D projection of the paraboloid is a filled ellipse and

the random values from our previous texture were computed

inside the bounds of a unit square.

Rather, we need a method that can generate better samples,

arranged inside or very close to the filled ellipse at the center

of the 2D texture. To achieve that we analyze the equation of

the paraboloid that is used for this projection [7]:

f(x, y) =1

2−

1

2(x2 + y2) (3)

An elliptic paraboloid is defined as follows:

f(x, y) =x2

a2+

y2

b2(4)

with a and b being the curvature constants of the xy-plane.

If we rearrange Equation 3:

f(x, y) =1

2−

x2

2−

y2

2(5)

Equation 5 corresponds to an elliptical paraboloid translated

in the Z axis by a factor of 1/2, assuming a2 = b2 = 2 in

Equation 4. Also, since the terms a and b are equal, so, in fact,

this is a special case of the paraboloid: a circular one. In other

words, any horizontal cut of this paraboloid defines a circle.

For this reason to generate the sampling pattern we resort to a

procedural uniform sampling distribution on the unit circle. A

mathematical derivation of how to generate uniform samples

over the unit circle is provided in [19]. Our reasoning is led

by the observation that a typical unit box sampling over the

projected area of the paraboloid is wasteful, as many samples

may fall over empty regions of the texture. The results of

sampling in the unit circle versus the naive approach can be

seen in Figure 2, note that the projections appear elliptical

due to the stretching that occurs when blitting from a squared

(2562) buffer to the final viewport (16:9 aspect ratio) for

illustrative purposes.

Because a bounce of light carries information of the previ-

ous surfaces it bounced off, the paraboloid map’s position,

albedo and normals textures are read to create the list of

second-bounce VPLs and complete this sampling stage.

D. Shading the Scene

Once the list of first and second bounce VPLs is complete,

the algorithm is ready to shade the scene (Figure 1e). To miti-

gate the costs of lighting calculations with many light sources

we adhere to an interleaved shading approach [11], splitting

the framebuffer into smaller, equal-sized tiles with resolution

directly proportional to the number of VPLs. Each tile is

shaded by a different subset of lights and then interleaved

back together. Because each tile represents only a fraction

of the total framebuffer resolution, the number of fragments

to shade is reduced considerably per light, with a speed up

inversely proportional to the number of light sources per tile.

Because we have two lists of sampled lights, one for first-

bounces and another for second-bounces, it is necessary to nor-

malize the shading computations by dividing the contribution

of each VPL by the probability of it being selected. Because

we adhere to a uniform distribution, that probability is constant

and equal to 1

Nwhere N is total number of VPLs sampled

(including first and second bounces). Equation 6 describes how

the irradiance Lo at a point p is calculated:

Lo(p) =1

N

N∑

i=0

φcos(θ)+cos(σ)+

‖p − vi‖2(6)

where Lo is the outgoing radiance at a point p, φ is a constant

that corresponds to the intensity of the VPL. θ is the angle of

the surface normal and light direction. σ corresponds to the

angle of a VPL’s normal and the direction to the surface point.

Finally ‖p − vi‖2 is the squared distance between the surface

point p and the VPL vi.

After shading the radiance buffer, the tiles are interleaved

back and denoising is carried with an edge-aware gaussian

blur. Here albedo information is not applied over the scene

yet, as it would risk blurring finer details. Instead, in a final,

post-processing stage, we merge the denoised irradiance buffer

with the albedo obtained from the Gbuffer and perform gamma

correction and tone-mapping over the buffer as defined in

Reinhard et al. [20] to adjust the range of irradiance values to

the range of the monitor.

IV. RESULTS

In this section we will analyse the SSVP approach in three

aspects: performance, visual quality and temporal coherence.

Different parameter configurations are presented as we discuss

their influence over the behavior of the algorithm. All tests

were performed on a desktop with 16 GB of RAM, an

Nvidia GTX 1060, 6GB of VRAM and an Intel CPU i7

4790K. No boost configurations were used. The prototype

application was developed in C++ using OpenGL 4.6 API and

GLSL for shader programming, running over the Windows 10

Operational System.

A. Rendering Performance

To evaluate the performance of SSVP, we must first analyse

the computational cost of each stage of the pipeline. To carry

these tests a standard configuration was set to a FULL-HD

resolution of 1920x1080 pixels, 256 first-bounce VPLs and

256 second-bounce VPLs. We ran all of the experiments on

the Crytek Sponza Scene [21], a common model used in the

computer graphics literature due to its similarity to other game-

like scenes being used in industry.

A frametime analysis provides the relative time slices of

each subroutine in a specific frame of the application. This

test is commonly used in profiling real-time rendering soft-

ware to detect performance bottlenecks and opportunities for

optimizations. Figure 3 shows a bar chart of the relative time

percentages per pipeline stage.

From Figure 3 we can see that Shading is the most time

consuming stage for three of the four selected configurations,

(a) Uniform Unit-Box Sampling (b) Uniform Unit-Disk Sampling

Fig. 2. 120 uniform samples (blue points) in (a) the unit box and (b) the unit disk. In (a), samples are spread over the space without taking the centralprojection in account. In (b), samples match the projected area in a more coherent pattern.

Fig. 3. Average time slices of our algorithm for the Crytek Sponza scene,running on a viewport with 1920x1080 resolution, 4x4 interleaved samplingpattern and multiple cluster configurations. Misc. includes the time to generatea shadow map, to sample first-bounce VPLs with the RSM, to cluster thoseVPLs, to sample second-bounce VPLs and to denoise the image.

but the trend decreases as the number of clusters grow. It’s

relative time slice of SSVP starts at 80% for 4 Clusters and

drops to 54% at 16 Clusters. With 32 Clusters, rendering

times become dominated by paraboloid rendering and total

frametime surpass the real-time threshold by a large amount.

Pixel processing operations impose relatively heavy loads over

the GPU compared to other tasks, therefore it is expected to

be the dominant stage with regard to time per stage of our

algorithm.

However, with more and more paraboloid maps requir-

ing computational resources to perform multiple rasterization

passes over the geometry, this stage quickly becomes a bot-

tleneck, surpassing shading as the most time consuming step

of SSVP. Notice the growth in time as we double the number

of paraboloid maps until we reach 16 clusters where a sudden

spike makes it grow considerably, then at 32 clusters the total

frametime is three times larger than the previous. From this

standpoint, analysis of the relative percentual cost reveals that

paraboloid creation can quickly become prohibitive for the

tight thresholds of real-time rendering. Table I provides the

total percentages and rendering times for each configuration.

Notice how for configurations of 4 and 8 clusters, the al-

gorithm was able to perform faster than the original 16ms

4 Clusters 8 Clusters 16 Clusters 32 Clusters

Gbuffer 1.03 1.03 1.03 1.03

Misc 0.26 0.29 0.30 0.30

Paraboloid 1.16 2.50 7.90 56.29

Shade 10.31 10.76 10.94 10.95

Denoise 0.13 0.14 0.14 0.18

Total(ms) 12.89 14.74 20.31 68.75TABLE I

RELATIVE TIME OF EACH SSVP PIPELINE STAGE AS THE NUMBER OF

CLUSTERS CHANGE. TIME IS GIVEN IN MILLISECONDS. MISC

CORRESPONDS TO THE TIME NEEDED TO GENERATE A SHADOW MAP, TO

SAMPLE FIRST-BOUNCE VPLS, TO CLUSTER THOSE VPLS AND TO

SAMPLE SECOND-BOUNCE VPLS AND TO DENOISE THE IMAGE.

threshold. for real time applications this would be the ideal

scenario.

On the other hand, the fastest stages in SSVP’s pipeline,

K-Means Clustering and VPL sampling account for no more

than 0.8% and 0.2% of a single frame for any configuration.

The reason behind these low percentages is that clustering here

is done on compute shaders to leverage GPU parallelism and

sampling is a trivial step that consists of reading texels from

paraboloids to create a second-bounce VPL list.

The time curve for different paraboloid configurations grows

in a seemingly linear fashion for 4, 8 and 16 Clusters, beyond

this point (at the change to 32) time appears to grow with

a steeper slope. Preliminary investigation of this behavior

implies in increasing cache misses and points synchronization

as we grow the number of render passes in geometry shaders,

but no experiments with different hardware configurations

were performed therefore a decisive evaluation can not be

provided here.

B. Visual Quality

To begin the visual quality tests we follow the trend in

the computer graphics literature of carrying a ground truth

comparison. The ground truth image was generated with the

industry standard Path-Tracing technique to render the same

Crytek Sponza model using the open source research-oriented

renderer Mitsuba [22], with 256 ray samples per pixel. Figure

4 presents a comparison between SSVP and the ground truth.

It shows the impact of allowing direct light to bounce two

times through the scene, and compare the final image to the

ground truth. The addition of second-bounce VPLs introduce

some effects such as color bleeding, seeing nearby the curtains

for example and make the shadows brighter, giving a proper

sense of indirect lighting.

Ideally we would like to render a visibility map for each

VPL in real-time, but as shown in Table I the cost of rendering

more paraboloids impact negatively in rendering performance.

For this reason, to align with the goals of SSVP, the practical

scenario requires a tradeoff between visual quality and per-

formance to favor real-time and game-like applications. To

this end, an evaluation of the visual impact of rendering more

clusters and more second-bounce VPLs in the final image is

carried. We focus on effects that stand out more easily to the

human eye, such as color bleeding and artifacts like banding.

Screenshots were taken during navigation of the scene

through the same path for 8, 16 and 32 clusters (Figure 5)

with 64, 256 and 1024 total second-bounce VPLs. Notice

the color bleeding effect nearby the pillars, colored lighting

bouncing off the curtains spreads through the floor and the

walls, painting them red, blue and green. In these images it

is also possible to notice artifacts known as light leaking, due

to the lack of precise visibility calculations sometimes light

spreads towards unwanted locations, making them brighter

than intended. Visually these effects are related mainly to the

number of second bounces being rendered, but for larger num-

bers of clusters more lights are required to achieve a similar

levels. Pay attention to Figure 5 how 256 second-bounce VPLs

provide clearer colors with 8 clusters in comparison to 16

and 32 clusters respectively. At 1024 VPLs all configurations

appear to converge achieving similar results. No further effects

arise from higher configurations, but light leaking becomes

easier to notice.

A promptly noticeable effect of rendering with reduced

cluster count is the amount of banding artifacts on the edges of

shadows for single bounce lighting (shadows on the walls and

in the back of the scene). As more visibility maps are included,

these artifacts tend to soften and for very large numbers of

visibility maps (hundreds of thousands [3]) they converge

into soft-shadows. However, notice how adding second-bounce

VPLs provides a huge benefit in this case, hiding these artifacts

by spreading more light over the surfaces and attenuating the

banding. This reasoning guides the evaluation in this stage,

the fastest possible configuration that still maintains visual

plausibility is the one that better fits the objective of SSVP.

Going further in Figure 6 we compare different cluster

configurations over the scene using the HDR-VDP-2 metric

[23] to evaluate the error between an image and a reference.

The scene rendered with the most clusters (32) is used as the

baseline and the perceptual error metric is calculated for 16,

8 and 4 clusters.

The root-mean-square error computed for 4 clusters (Figure

6e) is 0.1888, 8 clusters (Figure 6f) is 0.1111 and 16 clusters

(Figure 6g) is 0.0936. The higher this value the more likely an

observer is to perceive differences in comparison to the refer-

ence image. The results from the second-bounce comparison

and HDR-VDP-2 imply that 16 clusters offers superior quality

over lesser configurations.

C. Temporal Coherence

The temporal coherence of a method evaluates how stable

a rendering algorithm performs for animated scenes, with

moving cameras or light sources.

To better analyse the coherence of SSVP we point the reader

to the videos in the supplemental material. This analysis is

complex and deserves a research of its own as can be seen over

the literature [24]. However SSVP introduces some techniques

to alleviate the impact of flickering when navigating the scene

by reusing cluster positions between frames and repeating the

sampling patterns. Because of this, SSVP is stable when mov-

ing the camera around the scene while keeping the directional

light source static. However, with dynamic light sources, a few

visual artifacts can be seen as in the black stripe over the red

curtain in Figure 7, mainly as a byproduct of cluster visibility

changing due to centroids moving along surfaces.

V. DISCUSSION

Starting from the definition that 60 frames per second (16

milliseconds per frame) is the optimal frame rate for hard real-

time applications, illumination algorithms must be computed at

frame times lower than that threshold, as such applications still

need time to compute other tasks such as in-game logic and

physics updates. The data in Table I shows that SSVP is able

to offer the speed needed for such highly interactive game-like

scenarios with average frame rates of 13 milliseconds for the

compromise configuration provided. The cost of performing

the lighting computations is mitigated here by interleaved

sampling, but this stage is expected to become slower as the

number of pixels grow, therefore a 1080p FULL-HD resolution

is recommended to favor performance. SSVP’s visual quality

is deeply reliant on the number of clusters we use to compute

visibility. However, increasing the number of paraboloid maps

too much will beat the purpose of the technique and imply

a heavy toll over frame rate. For this reason we would like

to render the minimum amount of paraboloid shadow maps

that preserves the plausibility of the visuals. The results from

the second-bounce comparison and HDR-VDP-2 combined are

aligned with our empirical evaluation, leading us to conclude

that 8 clusters suffice to preserve visual plausibility with render

times within the stabilished threshold of 16 milliseconds.

To enable typically aesthetically pleasing effects such as

color bleeding at least 256 second-bounce VPLs are needed for

any cluster configuration in the scene tested. Visual perception

metrics (HDR-VDP-2) suggest that for the Sponza scene, 8

clusters are enough to plausibly emulate indirect lighting at

rates lower than 16 milliseconds. An obvious drawback of

our approach is that we are unable to reproduce soft shadows

and self-occlusion natively, therefore it is mandatory to resort

to orthogonal methods for such effects. Finally, the visual

quality of SSVP is directly tied to the number of clusters

and second-bounces generated, for this reason increasing first-

bounce count will offer no further benefits over quality because

of the lack of visibility computations for every new VPL that is

introduced. In our experiments 256 first-bounce VPLs proved

good enough for our measurements of cluster stability but

(a) SSVP (One Bounce) (b) SSVP (Two Bounces) (c) Path-Tracing (Ground Truth)

Fig. 4. Difference between one (a) and two (b) bounces of light rendered by our technique when compared to the ground-truth (c).

(a) 8 clusters (64 2nd-Bounces) (b) 8 clusters (256 2nd-Bounces) (c) 8 clusters (1024 2nd-Bounces)

(d) 16 clusters (64 2nd-Bounces) (e) 16 clusters (256 2nd-Bounces) (f) 16 clusters (1024 2nd-Bounces)

(g) 32 clusters (64 2nd-Bounces) (h) 32 clusters (256 2nd-Bounces) (i) 32 clusters (1024 2nd-Bounces)

Fig. 5. Sponza rendered from the same viewpoint for different cluster configurations. To compute the number of second bounce VPLs sampled per clusters,divide the total number of VPLs by the number of clusters.

since our method allows for custom parametrization, the reader

is encouraged to experiment given custom needs.

VI. CONCLUSION AND FUTURE WORKS

With SSVP we extended VPL clusterization techniques to

reproduce diffuse indirect illumination for up to two diffuse

bounces of light under real-time requirements. Given the hard

requirements for global illumination in game-like scenarios,

we obtained at a method that could represent plausible images

under the required time constraints with temporal coherence

for moving cameras and reasonable results for dynamic lights.

Furthermore, despite mitigating the costs of shading with many

light sources, interleaved shading introduces structured noise

in the final image. Since removing these prove to be more

difficult than typical pseudorandom white-noise, we believe

new approaches to shading with many lights could be explored

following the research of [25] with monte-carlo methods

for VPL selection and distributing noise in a more visually

pleasant manner.

ACKNOWLEDGMENT

Marcio C. F. Macedo is financially supported by the

Postdoctoral National Program of the Coordination for the

Improvement of Higher Education Personnel (PNPD/CAPES),

grant number 88882.306277/2018-01. The authors would also

like to thank the National Council for Scientific and Techno-

logical Development for the financial support.

REFERENCES

[1] J. T. Kajiya, “The rendering equation,” in Proceedings of SIGGRAPH.ACM, 1986, pp. 143–150.

[2] H. W. Jensen, “Global illumination using photon maps,” in Rendering

Techniques’ 96. Springer, 1996, pp. 21–30.

(a) (b) (c) (d)

(e) (f) (g)

Fig. 6. HDR-VDP-2 metric applied to images generated with 4, 8 and 16 clusters (Figures 6a, 6b and 6c respectively) in comparison to 32 clusters in Figure6d. Figures 6e, 6f and 6g show the perceptual error metric for 4, 8 and 16 clusters when compared to the base image with 32 clusters.

(a) (b)

Fig. 7. The visibility map of a moving centroid changes with time with dynamic light sources, potentially causing temporally incoherent visual artifacts ascan be seen on the red curtain and the light blotch nearby the blue curtain on the right.

[3] A. Keller, “Instant Radiosity,” in Proceedings of the SIGGRAPH. ACM,1997, pp. 49–56.

[4] T. Ritschel, T. Grosch, M. H. Kim, H.-P. Seidel, C. Dachsbacher, andJ. Kautz, “Imperfect shadow maps for efficient computation of indirectillumination,” in ACM Transactions on Graphics (TOG), vol. 27, no. 5.ACM, 2008, p. 129.

[5] M. McGuire and M. Mara, “Efficient gpu screen-space ray tracing,”Journal of Computer Graphics Techniques (JCGT), vol. 3, no. 4, pp.73–85, 2014.

[6] T. Ritschel, T. Grosch, and H.-P. Seidel, “Approximating dynamic globalillumination in image space,” in Proceedings of the I3D. ACM, 2009,pp. 75–82.

[7] W. Heidrich and H.-P. Seidel, “View-independent Environment Maps.”in Workshop on Graphics Hardware, 1998, pp. 39–45.

[8] C. Dachsbacher and M. Stamminger, “Reflective shadow maps,” inProceedings of the I3D. ACM, 2005, pp. 203–231.

[9] C. Dachsbacher, J. Krivanek, M. Hasan, A. Arbree, B. Walter, andJ. Novak, “Scalable realistic rendering with many-light methods,” inComputer Graphics Forum, vol. 33, no. 1. Wiley Online Library, 2014,pp. 88–104.

[10] C. Dachsbacher and M. Stamminger, “Splatting indirect illumination,”in Proceedings of the I3D. ACM, 2006, pp. 93–100.

[11] B. Segovia, J. C. Iehl, R. Mitanchey, and B. Peroche, “Non-interleaveddeferred shading of interleaved sample patterns,” in Graphics Hardware,2006, pp. 53–60.

[12] M. Hasan, J. Krivanek, B. Walter, and K. Bala, “Virtual spherical lightsfor many-light rendering of glossy scenes,” in ACM Transactions on

graphics (TOG), vol. 28, no. 5. ACM, 2009, p. 143.[13] Z. Dong, T. Grosch, T. Ritschel, J. Kautz, and H.-P. Seidel, “Real-time

Indirect Illumination with Clustered Visibility,” in Proceedings of the

VMV, vol. 9, 2009, pp. 187–196.[14] Y. Tokuyoshi, “Virtual Spherical Gaussian Lights for Real-time Glossy

Indirect Illumination,” in Computer Graphics Forum, vol. 34, no. 7.Wiley Online Library, 2015, pp. 89–98.

[15] P. Hedman, T. Karras, and J. Lehtinen, “Sequential monte carlo instantradiosity,” in Proceedings of the I3D. ACM, 2016, pp. 121–128.

[16] M. Mara, M. McGuire, D. Nowrouzezahrai, and D. P. Luebke, “Deepg-buffers for stable global illumination approximation.” in High Perfor-

mance Graphics, 2016, pp. 87–98.[17] M. Deering, S. Winner, B. Schediwy, C. Duffy, and N. Hunt, “The

triangle processor and normal vector shader: a vlsi system for highperformance graphics,” in Proceedings of the SIGGRAPH, vol. 22, no. 4.ACM, 1988, pp. 21–30.

[18] T.-T. Wong, W.-S. Luk, and P.-A. Heng, “Sampling with hammersleyand halton points,” Journal of graphics tools, vol. 2, no. 2, pp. 9–24,1997.

[19] M. Pharr, W. Jakob, and G. Humphreys, Physically based rendering:

From theory to implementation. Morgan Kaufmann, 2016.[20] E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone

reproduction for digital images,” in Proceedings of the 29th annual

conference on Computer graphics and interactive techniques, 2002, pp.267–276.

[21] M. McGuire, “Computer Graphics Archive,” July 2017. [Online].Available: https://casual-effects.com/data

[22] W. Jakob, “Mitsuba renderer,” 2010, http://www.mitsuba-renderer.org.[23] R. Mantiuk, K. J. Kim, A. G. Rempel, and W. Heidrich, “Hdr-vdp-2:

A calibrated visual metric for visibility and quality predictions in allluminance conditions,” ACM Transactions on graphics (TOG), vol. 30,no. 4, pp. 1–14, 2011.

[24] D. Scherzer, L. Yang, O. Mattausch, D. Nehab, P. V. Sander, M. Wim-mer, and E. Eisemann, “Temporal coherence methods in real-timerendering,” in Computer Graphics Forum, vol. 31, no. 8. Wiley OnlineLibrary, 2012, pp. 2378–2408.

[25] E. Heitz and L. Belcour, “Distributing monte carlo errors as a blue noisein screen space by permuting pixel seeds between frames,” in Computer

Graphics Forum, vol. 38, no. 4. Wiley Online Library, 2019, pp. 149–158.