Masked Occlusion Culling

Defines

QUICK_MASK: Configure the algorithm used for updating and merging hierarchical z buffer entries. If QUICK_MASK is defined to 1, use the algorithm from the paper “Masked Software Occlusion Culling”, which has good balance between performance and low leakage. If QUICK_MASK is defined to 0, use the algorithm from “Masked Depth Culling for Graphics Hardware” which has less leakage, but also lower performance.

USE_D3D: Configures the library for use with Direct3D (default) or OpenGL rendering. This changes whether the screen space Y axis points downwards (D3D) or upwards (OGL), and is primarily important in combination with the PRECISE_COVERAGE define, where this is important to ensure correct rounding and tie-breaker behaviour. It also affects the ScissorRect screen space coordinates and the memory layout of the buffer returned by ComputePixelDepthBuffer().

PRECISE_COVERAGE: Define PRECISE_COVERAGE to 1 to more closely match GPU rasterization rules. The increased precision comes at a cost of slightly lower performance.

ENABLE_STATS: Define ENABLE_STATS to 1 to gather various statistics during occlusion culling. Can be used for profiling and debugging. Note that enabling this function will reduce performance significantly.

MASKED_TARGET_SIMD_SSE

MOC_VIRTUAL

MOC_PURE

MOC_OVERRIDE

MOC_SINGLE_IMPLEMENTATION

class MaskedOcclusionCulling

#include <dag_maskedOcclusionCulling.h>

Public Types

enum Implementation

Values:

enumerator SSE2

enumerator SSE41

enumerator AVX2

enumerator NEON

enum BackfaceWinding

Values:

enumerator BACKFACE_NONE

enumerator BACKFACE_CW

enumerator BACKFACE_CCW

enum CullingResult

Values:

enumerator VISIBLE

enumerator OCCLUDED

enumerator VIEW_CULLED

enum ClipPlanes

Values:

enumerator CLIP_PLANE_NONE

enumerator CLIP_PLANE_NEAR

enumerator CLIP_PLANE_LEFT

enumerator CLIP_PLANE_RIGHT

enumerator CLIP_PLANE_BOTTOM

enumerator CLIP_PLANE_TOP

enumerator CLIP_PLANE_SIDES

enumerator CLIP_PLANE_ALL

typedef void *(*pfnAlignedAlloc)(size_t alignment, size_t size)

typedef void (*pfnAlignedFree)(void *ptr)

Public Functions

virtual void SetResolution(unsigned int width, unsigned int height) = 0

Sets the resolution of the hierarchical depth buffer. This function will re-allocate the current depth buffer (if present). The contents of the buffer is undefined until ClearBuffer() is called.

Parameters:

witdh – The width of the buffer in pixels, must be a multiple of 8
height – The height of the buffer in pixels, must be a multiple of 4

virtual void GetResolution(unsigned int &width, unsigned int &height) const = 0

Gets the resolution of the hierarchical depth buffer.

Parameters:

witdh – Output: The width of the buffer in pixels
height – Output: The height of the buffer in pixels

virtual void ComputeBinWidthHeight(unsigned int nBinsW, unsigned int nBinsH, unsigned int &outBinWidth, unsigned int &outBinHeight) = 0

Returns the tile size for the current implementation.

Parameters:

nBinsW – Number of vertical bins, the screen is divided into nBinsW x nBinsH rectangular bins.
nBinsH – Number of horizontal bins, the screen is divided into nBinsW x nBinsH rectangular bins.
outBinWidth – Output: The width of the single bin in pixels (except for the rightmost bin width, which is extended to resolution width)
outBinHeight – Output: The height of the single bin in pixels (except for the bottommost bin height, which is extended to resolution height)

virtual void SetNearClipPlane(float nearDist) = 0

Sets the distance for the near clipping plane. Default is nearDist = 0.

Parameters:: nearDist – The distance to the near clipping plane, given as clip space w

virtual float GetNearClipPlane() const = 0: Gets the distance for the near clipping plane.

virtual void ClearBuffer() = 0: Clears the hierarchical depth buffer.

virtual void MergeBuffer(MaskedOcclusionCulling *BufferB) = 0: Merge a second hierarchical depth buffer into the main buffer.

virtual CullingResult RenderTriangles(const float *inVtx, const unsigned short *inTris, int nTris, const float *modelToClipMatrix = nullptr, BackfaceWinding bfWinding = BACKFACE_CW, ClipPlanes clipPlaneMask = CLIP_PLANE_ALL) = 0

Renders a mesh of occluder triangles and updates the hierarchical z buffer with conservative depth values.

This function is optimized for vertex layouts with stride 16 and y and w offsets of 4 and 12 bytes, respectively.

Parameters:

inVtx – Pointer to an array of input vertices, should point to the x component of the first vertex. The input vertices are given as (x,y,w) coordinates in clip space. The memory layout can be changed using vtxLayout.
inTris – Pointer to an array of vertex indices. Each triangle is created from three indices consecutively fetched from the array.
nTris – The number of triangles to render (inTris must contain atleast 3*nTris entries)
modelToClipMatrix – all vertices will be transformed by this matrix before performing projection. If nullptr is passed the transform step will be skipped
bfWinding – Sets triangle winding order to consider backfacing, must be one one of (BACKFACE_NONE, BACKFACE_CW and BACKFACE_CCW). Back-facing triangles are culled and will not be rasterized. You may use BACKFACE_NONE to disable culling for double sided geometry
clipPlaneMask – A mask indicating which clip planes should be considered by the triangle clipper. Can be used as an optimization if your application can determine (for example during culling) that a group of triangles does not intersect a certain frustum plane. However, setting an incorrect mask may cause out of bounds memory accesses.
vtxLayout – A struct specifying the vertex layout (see struct for detailed description). For best performance, it is advisable to store position data as compactly in memory as possible.

Returns:

Will return VIEW_CULLED if all triangles are either outside the frustum or backface culled, returns VISIBLE otherwise.

virtual CullingResult TestRect(float xmin, float ymin, float xmax, float ymax, float wmin) const = 0

Occlusion query for a rectangle with a given depth. The rectangle is given in normalized device coordinates where (x,y) coordinates between [-1,1] map to the visible screen area. The query uses a GREATER_EQUAL (reversed) depth test meaning that depth values equal to the contents of the depth buffer are counted as visible.

Parameters:

xmin – NDC coordinate of the left side of the rectangle.
ymin – NDC coordinate of the bottom side of the rectangle.
xmax – NDC coordinate of the right side of the rectangle.
ymax – NDC coordinate of the top side of the rectangle.
ymax – NDC coordinate of the top side of the rectangle.
wmin – Clip space W coordinate for the rectangle.

Returns:

The query will return VISIBLE if the rectangle may be visible, OCCLUDED if the rectangle is occluded by a previously rendered object, or VIEW_CULLED if the rectangle is outside the view frustum.

virtual CullingResult TestTriangles(const float *inVtx, const unsigned short *inTris, int nTris, const float *modelToClipMatrix = nullptr, BackfaceWinding bfWinding = BACKFACE_CW, ClipPlanes clipPlaneMask = CLIP_PLANE_ALL) = 0

This function is similar to RenderTriangles(), but performs an occlusion query instead and does not update the hierarchical z buffer. The query uses a GREATER_EQUAL (reversed) depth test meaning that depth values equal to the contents of the depth buffer are counted as visible.

This function is optimized for vertex layouts with stride 16 and y and w offsets of 4 and 12 bytes, respectively.

Parameters:

inVtx – Pointer to an array of input vertices, should point to the x component of the first vertex. The input vertices are given as (x,y,w) coordinates in clip space. The memory layout can be changed using vtxLayout.
inTris – Pointer to an array of triangle indices. Each triangle is created from three indices consecutively fetched from the array.
nTris – The number of triangles to render (inTris must contain atleast 3*nTris entries)
modelToClipMatrix – all vertices will be transformed by this matrix before performing projection. If nullptr is passed the transform step will be skipped
bfWinding – Sets triangle winding order to consider backfacing, must be one one of (BACKFACE_NONE, BACKFACE_CW and BACKFACE_CCW). Back-facing triangles are culled and will not be occlusion tested. You may use BACKFACE_NONE to disable culling for double sided geometry
clipPlaneMask – A mask indicating which clip planes should be considered by the triangle clipper. Can be used as an optimization if your application can determine (for example during culling) that a group of triangles does not intersect a certain frustum plane. However, setting an incorrect mask may cause out of bounds memory accesses.
vtxLayout – A struct specifying the vertex layout (see struct for detailed description). For best performance, it is advisable to store position data as compactly in memory as possible.

Returns:

The query will return VISIBLE if the triangle mesh may be visible, OCCLUDED if the mesh is occluded by a previously rendered object, or VIEW_CULLED if all triangles are entirely outside the view frustum or backface culled.

virtual void BinTriangles(const float *inVtx, const unsigned short *inTris, int nTris, TriList *triLists, unsigned int nBinsW, unsigned int nBinsH, const float *modelToClipMatrix = nullptr, BackfaceWinding bfWinding = BACKFACE_CW, ClipPlanes clipPlaneMask = CLIP_PLANE_ALL) = 0

Perform input assembly, clipping , projection, triangle setup, and write triangles to the screen space bins they overlap. This function can be used to distribute work for threading (See the CullingThreadpool class for an example)

Parameters:

inVtx – Pointer to an array of input vertices, should point to the x component of the first vertex. The input vertices are given as (x,y,w) coordinates in clip space. The memory layout can be changed using vtxLayout.
inTris – Pointer to an array of vertex indices. Each triangle is created from three indices consecutively fetched from the array.
nTris – The number of triangles to render (inTris must contain atleast 3*nTris entries)
triLists – Pointer to an array of TriList objects with one TriList object per bin. If a triangle overlaps a bin, it will be written to the corresponding trilist. Note that this method appends the triangles to the current list, to start writing from the beginning of the list, set triList.mTriIdx = 0
nBinsW – Number of vertical bins, the screen is divided into nBinsW x nBinsH rectangular bins.
nBinsH – Number of horizontal bins, the screen is divided into nBinsW x nBinsH rectangular bins.
modelToClipMatrix – all vertices will be transformed by this matrix before performing projection. If nullptr is passed the transform step will be skipped
clipPlaneMask – A mask indicating which clip planes should be considered by the triangle clipper. Can be used as an optimization if your application can determine (for example during culling) that a group of triangles does not intersect a certain frustum plane. However, setting an incorrect mask may cause out of bounds memory accesses.
vtxLayout – A struct specifying the vertex layout (see struct for detailed description). For best performance, it is advisable to store position data as compactly in memory as possible.
bfWinding – Sets triangle winding order to consider backfacing, must be one one of (BACKFACE_NONE, BACKFACE_CW and BACKFACE_CCW). Back-facing triangles are culled and will not be binned / rasterized. You may use BACKFACE_NONE to disable culling for double sided geometry

virtual void RenderTrilist(const TriList &triList, const ScissorRect *scissor) = 0

Renders all occluder triangles in a trilist. This function can be used in combination with BinTriangles() to create a threded (binning) rasterizer. The bins can be processed independently by different threads without risking writing to overlapping memory regions.

Parameters:

triLists – A triangle list, filled using the BinTriangles() function that is to be rendered.
scissor – A scissor box limiting the rendering region to the bin. The size of each bin must be a multiple of 32x8 pixels due to implementation constraints. For a render target with (width, height) resolution and (nBinsW, nBinsH) bins, the size of a bin is: binWidth = (width / nBinsW) - (width / nBinsW) % 32; binHeight = (height / nBinsH) - (height / nBinsH) % 8; The last row and column of tiles have a different size: lastColBinWidth = width - (nBinsW-1)*binWidth; lastRowBinHeight = height - (nBinsH-1)*binHeight;

virtual void ComputePixelDepthBuffer(float *depthData, bool flipY) = 0

Creates a per-pixel depth buffer from the hierarchical z buffer representation. Intended for visualizing the hierarchical depth buffer for debugging. The buffer is written in scanline order, from the top to bottom (D3D) or bottom to top (OGL) of the surface. See the USE_D3D define.

Parameters:: depthData – Pointer to memory where the per-pixel depth data is written. Must hold storage for atleast width*height elements as set by setResolution.

virtual OcclusionCullingStatistics GetStatistics() = 0: Fetch occlusion culling statistics, returns zeroes if ENABLE_STATS define is not defined. The statistics can be used for profiling or debugging.

virtual Implementation GetImplementation() = 0: Returns the implementation (CPU instruction set) version of this object.

inline void GetAllocFreeCallback(pfnAlignedAlloc &allocCallback, pfnAlignedFree &freeCallback): Get used memory alloc/free callbacks.

virtual void CombinePixelDepthBuffer2W(float *depthData, int w, int h) = 0

virtual void DecodePixelDepthBuffer2W(float *depthData, int w, int h) = 0

virtual void mergeOcclusions(MaskedOcclusionCulling **another_occl, uint32_t occl_count, uint32_t first_tile, uint32_t last_tile) = 0: Fetch occlusion culling statistics, returns zeroes if ENABLE_STATS define is not defined. The statistics can be used for profiling or debugging.

virtual void mergeOcclusionsZmin(MaskedOcclusionCulling **another_occl, uint32_t occl_count, uint32_t first_tile, uint32_t last_tile) = 0

virtual uint32_t getTilesCount() const = 0

Public Static Functions

static MaskedOcclusionCulling *Create(): Creates a new object with default state, no z buffer attached/allocated.

static MaskedOcclusionCulling *Create(Implementation RequestedSIMD, pfnAlignedAlloc alignedAlloc, pfnAlignedFree alignedFree)

Creates a new object with default state, no z buffer attached/allocated.

Parameters:

alignedAlloc – Pointer to a callback function used when allocating memory
alignedFree – Pointer to a callback function used when freeing memory

static void Destroy(MaskedOcclusionCulling *moc): Destroys an object and frees the z buffer memory. Note that you cannot use the delete operator, and should rather use this function to free up memory.

static void TransformVertices(const float *mtx, const float *inVtx, float *xfVtx, unsigned int nVtx)

Utility function for transforming vertices and outputting them to an (x,y,z,w) format suitable for the occluder rasterization and occludee testing functions.

Parameters:

mtx – Pointer to matrix data. The matrix should column major for post multiplication (OGL) and row major for pre-multiplication (DX). This is consistent with OpenGL / DirectX behavior.
inVtx – Pointer to an array of input vertices. The input vertices are given as (x,y,z) coordinates. The memory layout can be changed using vtxLayout.
xfVtx – Pointer to an array to store transformed vertices. The transformed vertices are always stored as array of structs (AoS) (x,y,z,w) packed in memory.
nVtx – Number of vertices to transform.
vtxLayout – A struct specifying the vertex layout (see struct for detailed description). For best performance, it is advisable to store position data as compactly in memory as possible. Note that for this function, the w-component is assumed to be 1.0.

Protected Functions

inline virtual ~MaskedOcclusionCulling()

Protected Attributes

pfnAlignedAlloc mAlignedAllocCallback

pfnAlignedFree mAlignedFreeCallback

mutable OcclusionCullingStatistics mStats

struct OcclusionCullingStatistics

#include <dag_maskedOcclusionCulling.h>

Statistics that can be gathered during occluder rendering and visibility to aid debugging and profiling. Must be enabled by changing the ENABLE_STATS define.

Public Members

long long mNumProcessedTriangles

Number of occluder triangles processed in total.

Number of ocludee triangles processed (TestTriangles())

long long mNumRasterizedTriangles

Number of occluder triangles passing view frustum and backface culling

Number of ocludee triangle passing view frustum and backface culling

long long mNumTilesTraversed

Number of tiles traversed by the rasterizer.

Number of tiles traversed by triangle & rect rasterizers.

long long mNumTilesUpdated: Number of tiles where the hierarchical z buffer was updated.

long long mNumTilesMerged: Number of tiles where the hierarchical z buffer was updated.

struct MaskedOcclusionCulling::OcclusionCullingStatistics mOccluders

long long mNumProcessedRectangles: Number of rects processed (TestRect())

struct MaskedOcclusionCulling::OcclusionCullingStatistics mOccludees

struct ScissorRect

#include <dag_maskedOcclusionCulling.h>

Used to control scissoring during rasterization. Note that we only provide coarse scissor support. The scissor box x coordinates must be a multiple of 32, and the y coordinates a multiple of 8. Scissoring is mainly meant as a means of enabling binning (sort middle) rasterizers in case application developers want to use that approach for multithreading.

Public Functions

inline ScissorRect()

inline ScissorRect(int minX, int minY, int maxX, int maxY)

Public Members

int mMinX = 0: Screen space X coordinate for left side of scissor rect, inclusive and must be a multiple of 32

int mMinY = 0: Screen space Y coordinate for bottom side of scissor rect, inclusive and must be a multiple of 8

int mMaxX = 0: Screen space X coordinate for right side of scissor rect, non inclusive and must be a multiple of 32

int mMaxY = 0: Screen space Y coordinate for top side of scissor rect, non inclusive and must be a multiple of 8

struct TriList

#include <dag_maskedOcclusionCulling.h>

Used to specify storage area for a binlist, containing triangles. This struct is used for binning and multithreading. The host application is responsible for allocating memory for the binlists.

Public Members

unsigned int mNumTriangles: Maximum number of triangles that may be stored in mPtr.

unsigned int mTriIdx: Index of next triangle to be written, clear before calling BinTriangles to start from the beginning of the list

float *mPtr: Scratchpad buffer allocated by the host application.