Aproximating Global Illumination in Real Timestudentnet.cs.manchester.ac.uk/resources/library/3rd-year-projects/... · MEng Computer Science Third Year Project Aproximating Global

University of Manchester

MEng Computer Science Third Year Project

Aproximating Global Illumination inReal Time

John Gresty

supervised byDr. Steve Pettifer

May 2016

Abstract

This paper describes the development and results of a system that has an aim to produceeffects similar to global illumination techniques but in real time. This work is for the 3rdyear computer science project at the University of Manchester. The application developeddid not achieve these goals but did successfully implement some techniques that could berefined to develop a solution that could produce a satisfying product for a domain specifictask. Given more time, technical knowledge or better tools to complete this task wouldbe adequate call for further work in this area.

Contents

1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Voxelise scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Process voxels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.4 Render voxels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.5 Extras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Verification of correctness . . . . . . . . . . . . . . . . . . . . . . . 63.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Reflection and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Appendix A: Full Testing System Specifications . . . . . . . . . . . . . . . 9

1 Context

There has been a strive for photorealism in computer generated graphics since computerscould produce any image. This problem is very hard as to perfectly recreate an image asif it were taken from a real camera, each photon would have to be simulated flying aroundthe scene and interacting with each atom it hits. Clearly doing such a thing is extremelyinfeasible in both in terms of computational time and the size of the data required.All computer generated images are therefore an approximation with varying degrees ofaccuracy depending on the technique. Depending on the purpose of the generated images,different techniques for rendering are chosen based on their speed varies how accurate theimage they produce is.

A real time application in a rendering sense is an application that can render imagesat a fast enough rate where someone could interact with the application without beingfrustrated by how slowly the application updates the images it renders. Depending onthe application, the target time to render images can vary but is typically below about33 milliseconds or above 30 images rendered per second. To achieve the target time in atypical real time rendering application, images are produced by rasterisation of triangles.While this method is relatively fast, lighting calculations are only performed locally whichmeans that geometry other than the one being processed in the scene has no effect onthe final colour. For the purpose of rendering images that try to emulate real life, thistechnique does not produce a very accurate image as light should bounce off objects manytimes around the scene and this indirect light contributes to the final colour at any pixel.

The most accurate way to render images given no time constraint is ray tracing. Thisis where rays are fired into the scene that are simulating rays of light, each ray bounces

1

off any geometry that it hits according to the type of surface defined for that piece ofgeometry. The type of surface determines the new direction of the ray and amount ofcolour that it inherits from the surface. It is infeasible to simulate every photon of lightat the scale needed to perform an exact recreation of a scene so a limited number of raysare used and a limited number of bounces are used per ray. The higher the number ofrays and the number of bounces used are, the more accurate the final render would bebut at the cost of increasing the time taken to render an image.

Most of the calculations for ray tracing are for testing where rays interact with scenegeometry. While there are techniques for speeding up these calculations such as hierarchi-cal bounding volumes[1], there are still more calculations that need to be performed thanis feasible for real time speeds on consumer level hardware. If by reducing the scene intoa form that required a lot fewer interception calculations then ray tracing could becomefeasible for real time applications. A paper[2] demonstrating this has been publishedand results of which have shown there is promise in such techniques to achieve globalillumination effects in real time.

I hoped to create a program that would achieve an image that is more accurate thanimages from a typical real time application could produce, with aims to achieve resultssimilar to pre rendered images while still allowing interaction with the scene. I also wantedthere to be no pre baked data in the scene, this would allow fully dynamic scenes whichare scenes where any geometry, texture or light could be added, moved or deleted and animage could still be rendered without any artifacts present. This program was to be madetargeting consumer level hardware so that it could lead to a larger number of possibleapplications if successful. By basing the system off the research from the above mentionedNvidia paper[2], I believed that these goals were obtainable with my own application.

2 Development

The main feature of the system developed is the use of voxels. Voxels are a cubic area ofspace in 3D that contains arbitrary data which can be organised in efficient structures forfast processing. The system I have developed uses voxels in a pipeline process that consistsof three stages: generate voxels, process voxels and finally render. In my implementationeach part is a well defined and independent module that can be swapped out for alternativemethods or stages that perform extra work for a different set of features.

Due to the embarrassingly parallel nature of graphics rendering, it was clear that thesolution would need to run on a GPU to have the best performance. At first the OpenCLlibrary was considered but ultimately rejected due to lack to adequate tools available fordebugging software developed. I elected to use OpenGL to make use of the fast hardwarethat could be utilised for a similar function to which it was designed for. Building thesystem on top of the OpenGL pipeline presented many challenges, particularly whendebugging as OpenGL has very little room for feedback during execution.

2.1 Optimisations

From the early stages speed was considered in the design of the program. As such thereare a few techniques that I used to try to maintain as little overhead as possible andtherefore increase speed. One of these such techniques was to pack vertex data tightly ina small number of buffers to send to the GPU. As such I interleaved all the positions,normals and texture coordinates for every object into a single buffer and calculated indices

2

based on that single buffer structure. This should help with memory accesses on the GPUto make drawing geometry a bit faster than it would otherwise. Secondly I simplified thelighting model for the scene as much as possible, in fact a single directional light sourceis used over the entire scene. This greatly simplifies shaders so that they can be executedfaster with only a minor cost of accuracy.

There is an extension to OpenGL called ARB bindless texture[3] which allows the useof bindless textures. These are textures that are present in a lookup table on the GPUand can be accessed from a shader through an index rather than binding that textureto a texture unit. This drastically reduces the time between draw calls that is spentbinding textures and thus reduces the total frame time whenever textures are used. Asthis system assumes that all geometry has a texture applied to it, every frame would usetextures and as such bindless textures are always used to get the fastest result. While thisextension is implemented by most modern graphics drivers, it is not part of core OpenGLand documentation is sparse.

2.2 Voxelise scene

For this implementation I used a technique for thin surface voxelisation described inOpenGL Insights chapter 22[4]. This involved projecting each triangle orthogonally downits most dominant axis, then writing the colour and normal data from that triangle intoa 3D texture based on its x and y position along with its depth. Using such a techniquein OpenGL is achieved by using cameras aligned to each axis and selecting which one touse with a simple dot product of the triangles normal and each axis.

There was an issue with this thin surface voxelisation technique in that the result-ing voxels contained holes, this was particularly noticeable on small geometry. I foundthat this was caused by OpenGL using a sample in the center of each voxel that couldcompletely miss a triangle even if the triangle covered a large portion of the voxel. Theconservative rasterisation method of dilating each triangle by half a voxel width was testedbut resulted in a lot of data being written to voxels that only overlapped with trianglesby a small amount and should have been discarded. Instead, I realised that this was thesame problem that caused aliasing in a traditional rasterisation, as such I employed anantialiasing technique. In this instance I elected to use multi-sample antialiasing withfour samples per fragment, which resulted in a much more accurate voxel structure.

There are many cases in which more than one triangle maps onto a single voxel. Inthis case an average must be performed to get accurate values for data written to thevoxel. There is no way to know for certain in advance if there will be such a case andwhere they will occur so as such it is assumed that all voxels could have any number oftriangles mapped to them. Under this assumption a rolling average is used when writingany value based on the current value that has been written for that voxel already. Forobvious reasons this all needs to be done atomically.

2.3 Process voxels

The original plan was to process the voxels in some way to make reads into the structurea lot more efficient. However, this step was skipped due to time constraints and the voxelstructure is passed to the final stage as is.

3

2.4 Render voxels

My implementation had a few different options for this final stage which could be toggledat run time. The first is a direct volume render assuming all voxels are opaque. This isdone by marching a ray from each pixel into the scene though the camera sampling eachvoxel that a ray intercepts until the data sampled is not zero or the ray exits the voxel gridbounds. Once a ray has intercepted a voxel, the voxel colour can be read and the pixelthe ray was fired from is coloured using the sampled colour or if the ray does not interceptany geometry then the background colour is written to the pixel. An extension to this wasto also read the normal value of each voxel and colour each pixel using Gouraud shadingwhich gives a better resulting image.

Using similar methods to the direct volume render, shadows can be calculated by firinga shadow ray towards the light source at the point where each primary ray intercepts avoxel. If the shadow ray hits another voxel that is non zero then it is occluded and thereforein shadow, otherwise if the shadow ray exits the voxel grid bounds then I assume thatit the corresponding voxel is being directly illuminated from the scene’s lighting model.When a point is found to be in shadow, the colour of its corresponding pixel can bedarkened accordingly.

Finally, I experimented with using a number of secondary rays to calculate someindirect illumination. These rays are fired in a random direction from the initial primaryray interception then continue until they intercept with another voxel or leave the voxelgrid bounds. If there is another interception then another shadow ray is fired towards thelight source and if the tested voxel is found to be directly illuminated then a portion ofthe colour from the secondary voxel is accumulated in the final pixel colour calculation.A number of these rays are fired for each pixel with each secondary ray having a differentrandom direction to gain an approximation of the indirect illumination for the scene. Thisis the same technique used in non real time rendering ray tracing albeit with only twobounces, a very small number of rays and each ray being heavily biased towards lightsources.

2.5 Extras

While testing using various different scenes, I discovered that there is no real conventionof scale between different scenes. To better equip the program to deal with any sceneregardless of its origin it automatically adjusts several parameters based on how large theprovided scene is. Such parameters, for example camera movement speed and clippingplane distances, are calculated automatically at startup to ensure a consistent experiencewithout having to manually edit each scene to be the correct size for the program. Byusing this system along with a generic loader for obj files, it allowed the application touse any scene given to it in the obj format.

Another feature that was conceptualised during development is the ability to reloadshaders at runtime. As the program was using OpenGL, shaders are written in GLSLthat need to be compiled at run time. By separating these shaders out into separate filesand designing the program such that the files could be reloaded at run time, it ensuredthat I could make a change to any shader and quickly see the results without having tokeep closing and relaunching the program. While this feature did require some refactoringto the shader loading code, its inclusion made developing the system a lot smoother andfaster than it was without it.

4

3 Evaluation

Figure 1: Example renders from the Sponza scene

Figure 1 shows the output of various different rendering styles that the applicationproduces. On the top left is a render from a typical rasterisation, top right shows voxelsrepresented as points, bottom left is produced from a direct volume render of the voxelgrid and finally the bottom right image is the result of a very crude ray tracing algorithm.Each render captured from the same position with a voxel grid size of 2563 voxels.

3.1 Benchmarks

Unfortunately due to an update to the system that was used for developing, the debuggingtool that would have been used to capture accurate frame timings ceased to function. Itwas unlikely that there would be a fix for the tool within a reasonable time frame so as analternative average time taken to render each frame was taken over one second intervals.While this does provide an overall render time, a more detailed breakdown showing howlong each stage of the render would have been more ideal. All readings are done usingthe standard ’Sponza’ test scene on an Nvidia GTX 660 Ti at a resolution of 1920x720and a voxel grid size of 2563 voxels. The result from various benchmarks can be found intable 1.

Render Type Average Render TimeBaseline standard render 1.8msVoxels as points 20.9msDirect volume rendering 9.2msSimple ray tracing 288.1msSimple ray tracing, no voxel updates 285.0ms

Table 1: Render times for various render techniques

The baseline render does not include generating a voxel structure, it is simply a typicalrasterisation shown for comparison. Voxels as points does include generating a voxel grid

5

and then renders a point using GL POINTS as each voxel location. The majority of thisrender time is not the voxel generation but trying to render so many points. This can beseen in the direct volume rendering result where the voxel grid is still being generated eachframe but the time taken to render the frame is a lot lower. In addition, the direct volumerender result is also casting rays into the scene as described above2.4. The final two resultsare from bouncing the rays off geometry in the scene and thus performing a much largernumber of calculations per frame hence the larger render time. The difference betweenthe ray traced renders is that in the second result the voxel grid is not updated. As thedifference is only approximately 3ms for these results it can be assumed that generatingthe voxel grid took roughly this time.

The main bottleneck of the system is the large number of slow memory reads whenmarching rays over the voxel grid. Due to the nature of most scenes, the vast majorityof these reads will be reading zero values as the ray traverses empty space. Although asimple working solution, a grid structure is not suitable for this kind of application dueto the large number of memory accesses that need to be performed. A structure suchas a sparse octree to store the voxels in would be more ideal for this case as the largeareas of empty space could be quickly traversed in a lot fewer memory accesses whichwould significantly decrease rendering time. Construction of such a structure could beimplemented in the process voxels stage in the pipeline without any significant change tothe rest of the program.

3.2 Verification of correctness

Switching between the various rendering modes at run time while keeping the camera inthe same position reveals that the geometry in every render lines up perfectly with everyother image. As the voxelised scene aligns with the baseline image, it shows that thevoxelisation mapping is indeed accurate and correct. For the second stage in the pipeline,nothing is done so no testing is needed. Finally, the rendering phase presents someissues. All techniques produce an image that bears resemblance to an image produced thestandard rasterisation pipeline which verifies that the direct volume rendering is correct.In addition, shadows are created that align with where they are expected to be in relationto occluding geometry and the light sources which verifies that shadow calculations areindeed also performed correctly. However, when comparing an image produced with thebounce lighting technique to a reference image produced with an external renderer, thereare quite apparent differences between the images evident. I believe that this is caused byboth the limited number of rays used and the direction that secondary rays are fired in.The small number of rays used is required to even get close to real time speeds, howevermore could be used if optimisations such as structuring the voxels as discussed above areused as there would then be a lot more time that could be used.

3.3 Problems

There is a slight problem with the current implementation that OpenGL only supportsatomic writes to textures with each coordinate containing a 32 bit unsigned value. Asa single 3D texture is used to store the colour information, the available 32 bits is splitinto four 8 bit colour channels (RGBA). While this would normally not be an issue as 8bit colour is normal for rendering purposes, the voxel data is calculated using a rollingaverage from all triangles that map to that voxel. As is the averaging causes rounding

6

errors with only 8 available bits to work with and thus there is a flickering effect on thefinal image as triangles are rendered in different orders. There are some workarounds tothis problem:

• Always render triangles in a fixed order to remove flickering. This cannot be con-trolled using OpenGL and as such is not a viable option for this implementation.

• Use multiple textures for colour data which would reduce rounding errors by havingmore bits available at the cost of a much greater memory footprint.

• Only voxelise once and use the same voxel structure for rendering each frame.

While the second workaround was considered, the third was chosen due to concerns aboutmemory usage. While this does remove flickering in the final render, it removes the featureof allowing the scene to be fully dynamic. As a compromise I allowed toggling of whetherto use this workaround or not via a key press at runtime.

In my implementation, secondary ray direction is calculated based on the noise func-tion that is built in to OpenGL and using pixel coordinates from the primary ray as aseed. This does not produce the desired results as the noise function is implemented asa hash function and will therefore result in the same values being generated each timefor any given pixel. In fact there is no way to generate a random number inside a GLSLshader so this approach was somewhat sensible although the wrong one to use for thepurpose intended. To get a better result, random numbers could be generated on theCPU and then sampled by the GPU. This technique would incur a small penalty in boththe run time speed as a lot of random numbers would need to be generated each frameand also in memory usage on the GPU, but the amount it would use is trivial comparedto the rest of the system.

I am also not confident that my implementation of a ray tracing algorithm is correct.Due to the memory model in OpenGL, recursive calls to functions is not allowed sothe algorithm that is typically a recursive algorithm needed to be implemented using acomplex loop. While I am confident that I could have made a working recursive raytracing solution, the loop based one implemented cannot be easily verified to be correct.There are little to no ways of verifying code correctness written in GLSL so the only waythat I know for certain that the code works or not is to visually inspect the final image andit even then if there are problems then it is almost impossible to tell where those problemscame from. If the above secondary ray direction issue was resolved and the image is stillnot very accurate then I can only assume that my ray tracing implementation is incorrectbut I have found no way of knowing for certain.

4 Reflection and Conclusion

While the project initially started out going well, it did not continue in such a manor. Atthe beginning of the project, it was clear what I had to do and I knew exactly how to doit so I could very quickly design and implement working code. Due to this fast progress Imanaged to create most of the CPU side program within the first few weeks and slightlyahead of where I planned.

However, when development reached the stage where vastly new techniques whererequired, progress slowed a lot. I did not have a clear grasp on what I needed to doany given time in order to implement my high level designs. A lot of time was spent

7

researching GPU programming and other related concepts and a few non functioningprototypes were build but very little progress was made on the actual system.

When I believed that I had done the first stage in the pipeline, producing voxels, Ithen spent a long time trying to get a visualisation of the voxels working so that I couldverify that my code was working correctly. This process that normally would be easy todo with a functional debugger on a CPU was found to be extremely difficult on the GPUthanks to a very limited tools available. The only tool that I found that works in myconfiguration is the OpenGL Debugger by Nvidia. While this tool does allow inspectionof textures, for 3D unsigned textures such as the ones the voxels are stored in, the onlyoutput that the tool can produce is a memory dump of the texture. In the end I neededto develop my own system that rendered each voxel as a single point just to prove that myvoxel generation was indeed correct. This entire step could have been skipped by usinga system with better tool support, as such if I ever need to develop complex applicationsusing the GPU again then I now know that my configuration of hardware and software isnot ideal for the task.

Whilst I believe that this project has made me learn some unconventional techniquesneeded to implement this particular system, I do not believe that these skills would bevery transferable to other areas of programming. Although the process of learning unusualprogramming systems is a very useful skill that I have developed more over the courseof this project and as such I am more confident about tackling any completely unknownprogramming problem that may arise in my future.

8

Bibliography

[1] J.T. Klosowski, M. Held, J.S.B. Mitchell, H. Sowizral, and K. Zikan. Efficient col-lision detection using bounding volume hierarchies of k-DOPs. IEEE Transactionson Visualization and Computer Graphics, 4(1):21–36, March 1998. ISSN 10772626.doi: 10.1109/2945.675649. URL http://ieeexplore.ieee.org/lpdocs/epic03/

wrapper.htm?arnumber=675649.

[2] Cyril Crassin, Fabrice Neyret, Miguel Sainz, Simon Green, and Elmar Eisemann. In-teractive Indirect Illumination Using Voxel Cone Tracing. Computer Graphics Forum,30(7):1921–1930, September 2011. ISSN 01677055. doi: 10.1111/j.1467-8659.2011.02063.x. URL http://doi.wiley.com/10.1111/j.1467-8659.2011.02063.x.

[3] ARB bindless texture, June 2014. URL https://www.opengl.org/registry/specs/

ARB/bindless_texture.txt.

[4] Cyril Crassin and Simon Green. CRC Press, Patrick Cozzi and ChristopheRiccio, 2012. URL http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/

OpenGLInsights-SparseVoxelization.pdf,ChapterPDF.

1 Appendix A: Full Testing System Specifications

The benchmarks were recorded using the following hardware:

• Intel i5 3570k

• Nvidia GeForce GTX 660 Ti/PCIe/SSE2 2GB GDDR5

• 16GB DDR3 RAM

The following software configuration was also used:

• OpenGL version 4.5.0 Nvidia 364.19

• Linux 4.5.1-1-ARCH

• gcc-multilib 5.3.0 using flags -std=c++11 -O3

• GLFW 3.1.2

9

Documents

Aproximating Global Illumination in Real Timestudentnet.cs.manchester.ac.uk/resources/library/3rd-year-projects/... · MEng Computer Science Third Year Project Aproximating Global