Wednesday 21 January 2009

Deferred Shading, AA, alpha blending. DirectX 9

Soon after finishing my DX10 demo, Arseny Kapoulkine gave me an idea how to implement the same functionality using DirectX 9.

In DX9 it is impossible to create a multi-sampled G-buffer, and access separate samples independently in a pixel shader. To use multi-sampled render target as a texture, you need to resolve it, and you won't have any control over this process. So, I had to create a super-sampled G-buffer - a buffer with width and height multiplied by the factor of 2, which will give us SSAAx4.

If you remember, this technique used SampleMask to select samples in which the data of transparent objects will be saved. This time, instead of it we will use stencil buffer.
Using alpha test and a simple shader that draws a grid in a following pattern at a per-pixel step (image is magnified):
Grid shader,
we fill up the stencil buffer with values of 1, 2, 4 and 8. This step is performed only once since the grid doesn't change.

Now, using only stencil reference value, we can render surface to a single sample of our improvised super-sampled buffer. And if we ever need to render it to more than one sample, stencil mask will come in hand.

Volumetric lightning was removed from DX9 demo, because it generated too many instructions (but it still could be computed, at a cost of rendering fewer lights per pass).

Another optimization is performed, using hardware early stencil or depth tests (can be switched using 'O' key).
Before rendering the full-screen quad with a shader that computes lightning, we use a different shader that along with alpha test and stencil (or depth) buffer, masks the pixels that don't require per-sample lightning and can live fine with per-pixel computation.
Masking shader checks if the values of 4 samples in a 2x2 block are equal, and if they are (within some epsilon), a more simple shader can be run, with a little quality cost and a large performance boost.

The same trick was added to the DX10 demo, and with multi-sampled buffer there is no quality lost at all.

On GPUs where early stencil test is not available (GF6, GF7), you have to switch to early depth test.
To run the demo you need GeForce6***+ or RadeonX***+.

Download .rar with sources, 341kb.

Some screenshots:

Demo controls:
WASD\Arrow keys - movement. Mouse - camera orientation.
'P' key - turn early stencil\depth optimization on and off. (Default: on)
'G' key - turn transparent surface rendering on and off. (Default: on)
'O' key - switch between early stencil and depth tests. (Default: stencil test)

Some results (minimum is for transparent surface occupying full screen, and the maximum is when none of the transparent surfaces are seen).
7600 GS, GT: 2-4fps
8600 GT: 2-55 fps
9600 GT: 10-200 fps

X1950: 36-91 fps
HD 2600: 25-70 fps
HD 4850: 110-232 fps

On NVidia hardware, the shader with per-sample frequency light computation works much longer while having only 4 times more instructions than the simple one. I expected a maximum of 4 times slowdown. The reason for this is still unknown. If the fps drops dramatically, it is recommended to turn off transparent surface rendering leaving only Deferred Shading and Antialiasing on.

Wednesday 14 January 2009

Deferred Shading, AA, alpha-blending.

While playing GTA IV I couldn't help but to notice the use of Alpha to Coverage. For the last few years I've seen this feature used mainly to create smooth edges for alpha-tested polygons when FSAA is enabled1. In the game, however, it is used for in and out object fading, and for a part of transparent objects as well. As far as I know, their engine uses deferred shading rendering, so this method looks rather interesting and gave me some thoughts about implementing those features, mentioned in the topic, that don't live quite well with deferred rendering method.

So how did I do this? I've seen some nice demos from Humus that shows transparent objects in deferred rendering using method similar to depth peeling. To implement something like this, we need to be able to access every single sample in our multisampled G-buffer, so I've opened DX help along with tutorials 01 and 02, and started to write a demo under DX10 where such features are available.

My first idea was that during the G-buffer fill-up, for transparent objects we will explicitly set SampleMask with the number of active samples that relatively to the overall sample count, nearly corresponds to the alpha value (so, with AAx8 it will be possible to have 8 levels of transparency). Straight away, I've stumbled across some problems with transparent objects being placed in multiple layers. So I had to manually hard-code sample masks with randomly distributed active samples to get the result look right.

In the shader where lightning is computed, for every sample, I calculated two-sided diffuse lightning (using alpha value to linearly interpolate between sides), specular and simplified volumetric lightning. The weight of every sample is 1/sample count. At this point, I have realized that I can set different weights for different samples and after that it is possible to write every transparent surface information into a single sample in G-buffer. I used a simple formula to convert pixel alpha to sample weight, because they aren't equal parameters.
This new method didn't solve the order-independent transparency problem, but made things much easier, now it is possible to set AAx4 while still able to render 4 layers of transparent geometry and the alpha value isn't restricted to 8 levels any more.

As the result I have a demo along with sources that could be run if you have video card with DX10 support and Vista\Windows7 operating system:
Download, 335kb.

Some screenshots:

It's just a demo without pretending to be used in real-world applications (at least for today). There is a lot memory usage for the multisampled G-buffer, and a lot of bandwidth is required.
If the demo speed is satisfying, you can set a LIGHTS constant in "Shaders\LightPS.txt" file up to 32.

1 AntiAliasing with Transparency
2 Stencil-Routed K-Buffer
3 Volumetric lighting demo