Upload
zlhna
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Cg and Hardware Accelerated Shading. Cem Cebenoyan. Overview. Cg Overview Where we are in hardware today Physical Simulation on GPU GeforceFX / Cg Demos Advanced hair and skin rendering in “Dawn” Adaptive subdivision surfaces and ambient occlusion shading in “Ogre” - PowerPoint PPT Presentation
Citation preview
Cg and Hardware Accelerated Shading
Cg and Hardware Accelerated ShadingCem Cebenoyan
NVIDIA CONFIDENTIAL
Overview
Cg Overview
Where we are in hardware today
Physical Simulation on GPU
GeforceFX / Cg Demos
Advanced hair and skin rendering in “Dawn”
Adaptive subdivision surfaces and ambient occlusion shading in “Ogre”
Procedural shading in “Time Machine”
Depth of field and post-processing effects in “Toys”
OIT
NVIDIA CONFIDENTIAL
What is Cg?
A high level language for controlling parts of the graphics pipeline of modern GPUs
Today, this includes the vertex transformation and fragment processing units of the pipeline
Very C-like
Only simpler
Native support for vectors, matrices, dot-products, reflection vectors, etc.
Similar in scope to Renderman
But notably different to handle the way hardware accelerators work
NVIDIA CONFIDENTIAL
Cg Pipeline Overview
Graphics Program Written in Cg
“C” for Graphics
Compiled & Optimized
Low Level, Graphics“Assembly Code”
NVIDIA CONFIDENTIAL
Graphics Data Flow
ApplicationVertex
ProgramFragmentProgram Framebuffer
Cg Program Cg Program
// // Diffuse lighting // float d = dot (normalize(frag.N), normalize(frag.L)); if (d < 0) d = 0; c = d * f4tex2D( t, frag.uv ) * diffuse; …
NVIDIA CONFIDENTIAL
Graphics Hardware Today
Fully programmable vertex processing
Full IEEE 32-bit floating point processing
Native support for mul, dp3, dp4, rsq, pow, sin, cos...
Full support for branching, looping, subroutines
Fully programmable pixel processing
IEEE 32-bit, 16-bit (s10e5) math supported
Same native math ops as vertex, plus texture fetch, and derivative instructions
No branching, but >1000 instruction limit
Floating point textures / frame buffers
No blending / filtering yet
~500mhz core clock
NVIDIA CONFIDENTIAL
Physical Simulation
Simple cellular automata-like simulations are possible on NV20 class hardware (e.g. Game of Life, Greg James’ water simulation, Mark Harris’ CML work)
Use textures to represent physical quantities (e.g. displacement, velocity, force) on a regular grid
Multiple texture lookups allow access to neighbouring values
Pixel shader calculates new values, renders results back to texture
Each rendering pass draws a single quad, calculating next time step in simulation
NVIDIA CONFIDENTIAL
Physical Simulation
Problem: 8 bit precision on NV20 is not enough, causes drifting, stability problems
Float precision on NV30 allows GPU physics to match CPU accuracy
New fragment programming model (longer programs, flexible dependent texture reads) allows much more interesting simulations
NVIDIA CONFIDENTIAL
Example: Cloth Simulation Shader
Uses Verlet integration (see: Jakobsen, GDC 2001)
Avoids storing explicit velocity
newx = x + (x – oldx)*damping + a*dt*dt
Not always accurate, but stable!
Store current and previous position of each particle in 2 RGB float textures
Fragment program calculates new position, writes result to float buffer
Copy float buffer back to texture for next iteration (could use render-to-texture instead)
Swap current and previous textures
NVIDIA CONFIDENTIAL
Cloth Shader Demo
NVIDIA CONFIDENTIAL
Cloth Simulation Shader
2 passes:
1. Perform integration
2. Apply constraints:
Floor constraint
Sphere constraint
Distance constraints between particles
Read back float frame buffer using glReadPixels
Draw particles and constraints
NVIDIA CONFIDENTIAL
Cloth Simulation Cg Code (1st pass)
void Integrate(inout float3 x, float3 oldx, float3 a, float timestep2, float damping){ x = x + damping*(x - oldx) + a*timestep2;}
myFragout main(v2fconnector In, uniform texobjRECT x_tex, uniform texobjRECT ox_tex, uniform float timestep, uniform float damping, uniform float3 gravity){ myFragout Out; float2 s = In.TEX0.xy;
// get current and previous position float3 x = f3texRECT(x_tex, s); float3 oldx = f3texRECT(ox_tex, s);
// move the particle Integrate(x, oldx, gravity, timestep*timestep, damping); Out.COL.xyz = x; return Out;}
NVIDIA CONFIDENTIAL
Cloth Simulation Cg Code (2nd pass)
// constrain particle to be fixed distance from another particlevoid DistanceConstraint(float3 x, inout float3 newx, float3 x2, float restlength, float stiffness){ float3 delta = x2 - x; float deltalength = length(delta); float diff = (deltalength - restlength) / deltalength; newx = newx + delta*stiffness*diff;}
// constraint particle to be outside spherevoid SphereConstraint(inout float3 x, float3 center, float r){ float3 delta = x - center; float dist = length(delta); if (dist < r) { x = center + delta*(r / dist); }}
// constrain particle to be above floorvoid FloorConstraint(inout float3 x, float level){ if (x.y < level) { x.y = level; }}
NVIDIA CONFIDENTIAL
Cloth Simulation Cg Code (cont.)
myFragout main(v2fconnector In, uniform texobjRECT x_tex, uniform texobjRECT ox_tex,
uniform float dist, uniform float stiffness){ myFragout Out; float2 s = In.TEX0.xy; // get current position float3 x = f3texRECT(x_tex, s); // satisfy constraints FloorConstraint(x, 0.0f); SphereConstraint(x, float3(0.0, 2.0, 0.0), 1.0f); // get positions of neighbouring particles float3 x1 = f3texRECT(x_tex, s + float2(1.0, 0.0) ); float3 x2 = f3texRECT(x_tex, s + float2(-1.0, 0.0) ); float3 x3 = f3texRECT(x_tex, s + float2(0.0, 1.0) ); float3 x4 = f3texRECT(x_tex, s + float2(0.0, -1.0) ); // apply distance constraints float3 newx = x; if (s.x < 31) DistanceConstraint(x, newx, x1, dist, stiffness); if (s.x > 0) DistanceConstraint(x, newx, x2, dist, stiffness); if (s.y < 31) DistanceConstraint(x, newx, x3, dist, stiffness); if (s.y > 0) DistanceConstraint(x, newx, x4, dist, stiffness); Out.COL.xyz = newx; return Out;}
NVIDIA CONFIDENTIAL
Physical Simulation – Future Work
Limitation - only one destination buffer, can only modify position of one particle at a time
Could use pack instructions to store 2 vec4h (8 half floats) in 128 bit float buffer
Could also use additional textures to encode particle masses, stiffness, constraints between arbitrary particles (rigid bodies)
“float buffer to vertex array” extension offers possibility of directly interpreting results as geometry without any CPU intervention!
Collision detection with meshes is hard
NVIDIA CONFIDENTIAL
Developed 4 demos for the launch of GeForce FX
“Dawn”
“Toys”
“Time Machine”
“Ogre”(Spellcraft Studio)
Demos Introduction
NVIDIA CONFIDENTIAL
Characters Look Better With Hair
NVIDIA CONFIDENTIAL
Rendering Hair
Two options:
1) Volumetric (texture)
2) Geometric (lines)
We have used volumetric approximations (shells and fins) in the past (e.g. Wolfman demo)
Doesn’t work well for long hair
We considered using textured ribbons (popular in Japanese video games). Alpha sorting is a pain.
Performance of GeForce FX finally lets us render hair as geometry
NVIDIA CONFIDENTIAL
Rendering Hair as Lines
Each hair strand is rendered as a line strip (2-20 vertices, depending on curvature)
Problem: lines are a minimum of 1 pixel thick, regardless of distance from camera
Not possible to change line width per vertex
Can use camera-facing triangle strips, but these require twice the number of vertices, and have aliasing problems
NVIDIA CONFIDENTIAL
Anti-Aliasing
Two methods of anti-aliasing lines in OpenGL
GL_LINE_SMOOTH
High quality, but requires blending, sorting geometry
GL_MULTISAMPLE
Usually lower quality, but order independent
We used multisample anti-aliasing with “alpha to coverage” mode
By fading alpha to zero at the ends of hairs, coverage and apparent thickness decreases
“SAMPLE_ALPHA_TO_COVERAGE_ARB” is part of the ARB_multisample extension
NVIDIA CONFIDENTIAL
Hair Without Antialiasing
NVIDIA CONFIDENTIAL
Hair With Multisample Antialiasing
NVIDIA CONFIDENTIAL
Hair Shading
Hair is lit with simple anisotropic shader (Heidrich and Seidel model)
Low specular exponent, dim highlight looks best
Black hair = no shadows!
Self-shadowing hair is hard
Deep shadow maps
Opacity shadow maps
Top of head is painted black to avoid skin showing through
We also had a very short hair style, which helps
NVIDIA CONFIDENTIAL
Hair Styling is Important
NVIDIA CONFIDENTIAL
Hair Styling
Difficult to position 50,000 individual curves by hand
Typical solution is to define a small number of control hairs, which are then interpolated across the surface to produce render hairs
We developed a custom tool for hair styling
Commercial hair applications have poor styling tools and are not designed for real time output
NVIDIA CONFIDENTIAL
Hair Styling
Scalp is defined as a polygon mesh
Hairs are represented as cubic Bezier curves
Controls hairs are defined for each vertex
Render hairs are interpolated across triangles using barycentric coordinates
Number of generated hairs is based on triangle area to maintain constant density
Can add noise to interpolated hairs to add variation
NVIDIA CONFIDENTIAL
Hair Styling Tool
Provides a simple UI for styling hair
Combing tools
Lengthen / shorten
Straighten / mess up
Uses a simple physics simulation based on Verlet integration (Jakobson, GDC 2001)
Physics is run on control hairs only
Collision detection done with ellipsoids
NVIDIA CONFIDENTIAL
NVIDIA CONFIDENTIAL
NVIDIA CONFIDENTIAL
NVIDIA CONFIDENTIAL
Dawn Demo
Show demo
NVIDIA CONFIDENTIAL
NVIDIA CONFIDENTIAL
The Ogre Demo
A real-time preview of Spellcraft Studio’s in-production short movie “Yeah!”
Created in 3DStudio MAX
Used Character Studio for animation, plus Stitch plug-in for cloth simulation
Original movie was rendered in Brazil with global illumination
Available at: www.yeahthemovie.de
Our aim was to recreate the original as closely as possible, in real-time
NVIDIA CONFIDENTIAL
What are Subdivision Surfaces?
A curved surface defined as the limit of repeated subdivision steps on a polygonal model
Subdivision rules create new vertices, edges, faces based on neighboring features
We used the Catmull-Clark subdivision scheme (as used by Pixar)
MAX, Maya, Softimage, Lightwave all support forms of subdivision surfaces
NVIDIA CONFIDENTIAL
Realtime Adaptive Tessellation
Brute force subdivision is expensive
Generates lots of polygons where they aren’t needed
Number of polygons increases exponentially with each subdivision
Adaptive tessellation subdivides patches based on screen-space patch size test
Guaranteed crack-free
Generates normals and tangents on the fly
Culls off-screen and back-facing patches
CPU-based (uses SSE were possible)
NVIDIA CONFIDENTIAL
Control Mesh vs. Subdivided Mesh
4000 faces 17,000 triangles
NVIDIA CONFIDENTIAL
Control Mesh Detail
NVIDIA CONFIDENTIAL
Subdivided Mesh Detail
NVIDIA CONFIDENTIAL
Why Use Subdivision Surfaces?
Content
Characters were modeled with subdivision in mind (using 3DSMax “MeshSmooth/NURMS” modifier)
Scalability
wanted demo to be scalable to lower-end hardware
“Infinite” detail
Can zoom in forever without seeing hard edges
Animation compression
Just store low-res control mesh for each frame
May be accelerated on future GPUs
NVIDIA CONFIDENTIAL
Disadvantages of Realtime Subdivision
CPU intensive
But we might as well use the CPU for something!
View dependent
Requires re-tessellation for shadow map passes
Mesh topology changes from frame to frame
Makes motion blur difficult
NVIDIA CONFIDENTIAL
Ambient Occlusion Shading
Helps simulate the global illumination “look” of the original movie
Self occlusion is the degree to which an object shadows itself
“How much of the sky can I see from this point?”
Simulates a large spherical light surrounding the scene
Popular in production rendering – Pearl Harbor (ILM), Stuart Little 2 (Sony)
NVIDIA CONFIDENTIAL
Occlusion
N
NVIDIA CONFIDENTIAL
How To Calculate Occlusion
Shoot rays from surface in random directions over the hemisphere (centered around the normal)
The percentage of rays that hit something is the occlusion amount
Can also keep track of average of un-occluded directions – “bent normal”
Some Renderman compliant renders (e.g. Entropy) have a built-in occlusion() function that will do this
We can’t trace rays using graphics hardware (yet)
So we pre-calculate it!
NVIDIA CONFIDENTIAL
Occlusion Baking Tool
Uses ray-tracing engine to calculate occlusion values for each vertex in control mesh
We used 128 rays / vertex
Stored as floating point scalar for each vertex and each frame of the animation
Calculation took around 5 hours for 1000 frames
Subdivision code interpolates occlusion values using cubic interpolation
Used as ambient term in shader
NVIDIA CONFIDENTIAL
NVIDIA CONFIDENTIAL
NVIDIA CONFIDENTIAL
Ogre Demo
Show demo
NVIDIA CONFIDENTIAL
Procedural Shading in Time Machine
Goals for the Time Machine demo
Overview of effects
Metallic Paint
Wood
Chrome
Techniques used
Faux-BRDF reflection
Reveal and dXdT maps
Normal and DuDv scaling
Dynamic Bump mapping
Performance Issues
Summary
NVIDIA CONFIDENTIAL
Why do Time Machine?
GPUs are much more programmable
Thanks to generalized dependent texturing, more active textures (16 on GeForce FX) and (for our purposes) unlimited blend operations, high-quality animation is possible per-pixel
GeForce FX has >2x performance of GeForce 4Ti
Executing lots of per-pixel operations isn’t just possible; it can be done in real time.
Previous per-pixel animation was limited
Animated textures
PDE / CA effects (see Mark Harris’ talk at GDC)
Goal : Full-scene per-pixel animation
NVIDIA CONFIDENTIAL
Why do Time Machine? (continued)
Neglected pick-up trucks demonstrate a wide variety of surface effects, with intricate transitions and boundaries
Paint oxidizing, bleaching and rusting
Vinyl cracking
Wood splintering and fading
And more…
Not possible with just per-vertex animation!
NVIDIA CONFIDENTIAL
Time Machine Effects : Paint
Specular color shift Oxidation
Bubbling Rusting
60 Pixel Shader instructions, 11 textures
Paint textures:•Paint Color•Rust LUT•Shadow map•Spotlight mask•Light Rust Color*•Deep Rust Color*•Ambient Light*•Bubble Height*•Reveal Time*•New Environment*•Old Environment*(* = artist created)
NVIDIA CONFIDENTIAL
Effects (cont’d) : Wood, Chrome, Glass
Wood fades and cracks Chrome welts and corrodes
Headlights fog
23 instructions, 8 textures31 instructions, 6 textures
24 instructions, 4 textures
NVIDIA CONFIDENTIAL
Procedural or Not?
Procedural shading normally replaces textures with functions of several variables.
Time Machine uses textures liberally.
The only parameter to our shaders is time.
However, turning everything into math is expensive
Time Machine’s solution
Give artist direct control (textures) over final image, use functions to control transitions
NVIDIA CONFIDENTIAL
Techniques : Faux-BRDF ReflectionMany automotive paints exhibit a color-shift as a function of the light and viewer directions.
This effect has been approximated with analytic BRDFs (Lafortune’s cosine lobes)
And measured by Cornell University’s graphics lab
BRDF factorization [McCool, Rusinkiewicz] is one method to use this data on graphics hardware
Efficient representation with multiple 2D textures
Closely approximates the original BRDFs
But not necessarily the most efficient method for automotive paint, and not artist-controllable.
Reflection intensity is uninteresting (largely Blinn)
Rotated/projected axes hard to visualize
NVIDIA CONFIDENTIAL
Techniques : Faux-BRDF Reflection 2Our solution: project BRDF values onto a single 2D texture, and factor out the intensity
Compute intensity in real-time, using (N.H)^s
Texture varies slowly, so it can be low-res (64x64).
Anti-aliasing texture fixes laser noise at grazing angles
For automotive paints, N.L and N.H work well for axes.
Not physically accurate, but fast and high-quality.
Easy for artists to tweak.
Dupont Cayman lacquer Mystique lacquer
NVIDIA CONFIDENTIAL
Techniques : Reveal and dXdT maps
Artists do not want to paint hundreds of frames of animation for a surface transition (e.g., paint->rust)
Ultimately, effect is just a conditional:
if (time > n) color = rust; else color = paint;
Or an interpolation between a start and end point
paint = interpolate(paint, bleach, s*(time-n));
So all intermediate values can be generated.
For continuous effects, use dXdT (velocity) maps
Can be stored in alpha in a DXT5 texture.
NVIDIA CONFIDENTIAL
Performance Concerns
Executing large shaders is expensive.
First rule of optimization: Keep inner loops tight
Shaders are the inner loop, run >1M times per frame.
But graphics cards have many parallel units
Vertex, fragment, and texture units
Modern GPUs do a great job of hiding texture latency
Bandwidth is unimportant in long shaders
Time Machine runs at virtually the same framerate on a 500/500 GeForceFX as it does on a 500/400 or 500/550
So not using textures is wasting performance!
NVIDIA CONFIDENTIAL
Performance Concerns…
What makes a good texture?
Saves math operations
8 (RGBA) or 16 (HILO) bit precision sufficient
Depends on a limited number of variables
Textures we used
Interpolating between light and dark rust layers
Required computing the difference between light and dark layers’ reveal maps, and expanding to [0..1].
Function was dependent on current and reveal time.
Used to blend two texture maps
NVIDIA CONFIDENTIAL
Performance Concerns…
Textures Used, continued…
Surround Maps
Recomputing the normal requires knowing the heights of 4 texels (s-1,t), (s+1,t), (s,t+1) and (s,t-1)
Each height is only 1 8-bit component
Instead of 4 dependent fetches, we can pack all into 1S(s,t) = [ H(s-1, t), H(s+1, t), H(s,t-1), H(s,t+1) ]
Saved 4 math ops and 3 texture fetches + shuffle logic
NVIDIA CONFIDENTIAL
Time Machine demo
Show demo
NVIDIA CONFIDENTIAL
Toys Demo - Simple Depth of Field
Render scene to color and depth textures
Generate mipmaps for color texture
Render full screen quad with “simpledof” shader:
Depth = tex(depthtex, texcoord)
Coc (circle of confusion) = abs(depth*scale + bias)
Color = txd(colortex, texcoord, (coc,0), (0,coc))
Scale and bias are derived from the camera:Scale = (aperture * focaldistance * planeinfocus * (zfar – znear)) /
((planeinfocus – focaldistance) * znear * zfar)
Bias = (aperture * focaldistance * (znear – planeinfocus)) / ((planeinfocus * focaldistance) * znear)
NVIDIA CONFIDENTIAL
Artifacts: Bilinear Interpolation/Magnification
Bilinear artifacts in extreme back- and near-ground
Solution: multiple jittered samples
Even without jittering, a 4 or 5 sample rotated grid pattern brings smaller artifacts under control
Larger artifacts need jittered samples, and more of them
Then it’s just a tradeoff between noise from the jittering and bilinear interpolation artifacts
(and of course the quality/performance tradeoff with number of samples)
NVIDIA CONFIDENTIAL
Noise vs. Interpolation Artifacts
With Noise Without Noise
NVIDIA CONFIDENTIAL
Artifacts: Depth Discontinuities
Near-ground (blurry) pixels don’t properly blend out over top of mid-ground (sharp) pixels
Easy solution: Cheat!
Either don’t let objects get too far in front of the plane in focus, or blur everything a little more when they do – soft edges help hide this fairly well.
NVIDIA CONFIDENTIAL
Depth Discontinuities
NVIDIA CONFIDENTIAL
Fun With Color Matrices
Since we’re already rendering to a full-screen texture, it’s easy to muck with the final image.
Operations are just rotations / scales in RGB space
Color (hue) shift
Saturation
Brightness
Contrast
These are all matrices, so compose them together, and apply them as 3 dot products in the shader
NVIDIA CONFIDENTIAL
Original Image
NVIDIA CONFIDENTIAL
Colorshifted Image
NVIDIA CONFIDENTIAL
Black and White Image
NVIDIA CONFIDENTIAL
Toys Demo
Show demo
NVIDIA CONFIDENTIAL
Order Independent Transparency
Why is correct transparency hard?
Depth peeling
Two depth buffers
Enter the shadow map
Precision/invariance issues
Depth replace texture shader
Blending the layers
Other applications
NVIDIA CONFIDENTIAL
Good Transparency Bad Transparency
Can’t just glEnable(GL_BLEND)…
without OITwith OIT
NVIDIA CONFIDENTIAL
Why is correct transparency hard?
Most hardware does object-order rendering
Correct transparency requires sorted traversal
Have to render polygons in sorted order
Not very convenient
Polygons can’t intersect
Lot of extra application work
Especially difficult for dynamic scene databases
NVIDIA CONFIDENTIAL
Depth Peeling
The algorithm uses an “implicit sort” to extract multiple depth layers
First pass render finds front-most fragment color/depth
Each successive pass render finds (extracts) the fragment color/depth for the next-nearest fragment on a per pixel basis
Use dual depth buffers to compare previous nearest fragment with current
Second “depth buffer” used for comparison (read only) from texture [more on this later]
NVIDIA CONFIDENTIAL
Layer 0 Layer 1
Layer 2 Layer 3
NVIDIA CONFIDENTIAL
Cross-section view of depth peeling
0 depth 1
Layer 0 Layer 1 Layer 2
Depth peeling strips away depth layers with each successive pass. The frames above show the frontmost (leftmost) surfaces as bold black lines, hidden surfaces as thin black lines, and “peeled away” surfaces as light grey lines.
0 depth 1 0 depth 1
NVIDIA CONFIDENTIAL
Dual Depth Buffer Pseudo-code
for ( i = 0; i < num_passes; i++ ){
clear color bufferdepth unit 0:
if(i == 0) { disable depth test }else { enable depth test }bind depth buffer (i % 2)disable depth writes /* read-only depth test
*/ set depth func to GREATER
depth unit 1:bind depth buffer ((i+1) % 2)clear depth bufferenable depth writes;enable depth test;set depth func to LESS
render scenesave color buffer RGBA as layer i
}
NVIDIA CONFIDENTIAL
Implementation
There is no “dual depth buffer” extension to OpenGL, so what can we do?
Just need one depth test with writeable depth buffer – the other can be read-only
Shadow mapping is a read-only depth test!
Depth test can have an arbitrary camera location
Other interesting uses for clip volumes
Fast copies make this proposition reasonable
Copies will be unnecessary in the future…
NVIDIA CONFIDENTIAL
Precision / Invariance issues
Using shadow mapping hardware introduces precision and invariance issues
depth rasterization usually just needs to match output depth buffer precision, and requires no perspective correction
Texture hardware requires perspective correction and projection at high precision
Making things match would be difficult without the DEPTH_REPLACE texture shader
Computes with texture hardware at texture precision
Solves invariance problems at some extra expense
Will be cheaper in the future…
NVIDIA CONFIDENTIAL
1 layer 2 layers
3 layers 4 layers
NVIDIA CONFIDENTIAL
Compositing
Each time we peel, we capture the RGBA, then as a final step, we blend all the layers together from back to front
Opaque fragments completely overwrite previous transparent ones
NVIDIA CONFIDENTIAL
Conclusions
Results are nice!
Get correct transparency without invasive changes to internal data structures
Can be “bolted on” to existing CAD/CAM apps
Requires n scene traversals for n correctly sorted depths
n = 4 is often quite satisfactory (see previous slide)
Shadow maps are for more than shadows!
NVIDIA CONFIDENTIAL
Questions?
http://developer.nvidia.com
http://developer.nvidia.com/cg/
http://www.cgshaders.org/