Skip to content

Commit ee12c73

Browse files
committed
Task mesh rendering article WIP.
1 parent efe4226 commit ee12c73

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
**_Task and Mesh Shaders: A Practical Guide (Slang)_**
2+
[William Gunawan](https://www.williscool.com)
3+
Written on 2025/12/04
4+
5+
# Introduction
6+
7+
Mesh shaders represent a fundamental shift in GPU rendering pipelines.
8+
Unlike traditional vertex shaders that process vertices individually, mesh shaders adopt a compute-like programming model with explicit thread dispatch and shared memory access.
9+
10+
This article will demonstrate a practical task/mesh shader implementation in Vulkan with Slang, including:
11+
- Basic mesh shader pipelines (with and without task shaders)
12+
- GPU-driven frustum and meshlet backface culling
13+
- Integration with indirect draw workflows
14+
15+
For a comprehensive explanation of the mesh shader model, refer to NVIDIA's [Introduction to Mesh Shaders](https://developer.nvidia.com/blog/introduction-turing-mesh-shaders/) and AMD's [Mesh Shader Guide](https://gpuopen.com/learn/mesh_shaders/mesh_shaders-from_vertex_shader_to_mesh_shader/).
16+
17+
I also provide a small benchmark where I compare the performance of traditional rendering against task and mesh shaders.
18+
The most notable finding is that task and mesh shaders provide benefits in 2 ways: Improved cache hit rate during the draw calls and better culling granularity - reducing the number of vertices rasterized outright.
19+
It is available on [my website](../../technical/task-mesh-benchmarking/task-mesh-benchmarking.md.html).
20+
21+
## Terminology
22+
23+
**Task Shader / Amplification Shader**
24+
25+
An optional pre-processing stage that determines which mesh shader workgroups to spawn. Performs coarse culling (e.g., per-meshlet frustum culling) before mesh shading.
26+
Typically dispatched with 32-128 threads per workgroup to evaluate multiple meshlets in parallel.
27+
While all threads can `DispatchMesh` for mesh shader workgroups, only one needs to do it after a group shared sync.
28+
Called "Amplification Shader" in DirectX 12.
29+
30+
**Mesh Shader**
31+
32+
Generates primitives and vertices for rasterization.
33+
Replaces the traditional vertex/geometry shader stages.
34+
Outputs a variable number of triangles per workgroup (up to hardware limits, varies by vendor but typically 256 vertices/256 triangles).
35+
Though mesh shaders are not limited to triangles (you can output other primitives), triangles will be the focus of this article.
36+
37+
**Meshlet**
38+
39+
A small cluster of vertices and triangles, typically 32-64 vertices and 64-124 triangles. See why in the tips section of this [article](https://developer.nvidia.com/blog/using-mesh-shaders-for-professional-graphics/).
40+
Meshlets are the atomic unit processed by mesh shaders, designed to fit within GPU shared memory and optimize cache locality.
41+
42+
**Thread / Invocation**
43+
44+
A single execution instance within a thread group.
45+
Threads within a thread group can cooperate via shared memory and barriers.
46+
47+
**Thread Group / Workgroup**
48+
49+
A collection of threads dispatched together, sharing local memory and synchronization primitives.
50+
In task shaders, one workgroup typically evaluates multiple meshlets (often one per thread) and emits mesh shader workgroups for visible ones.
51+
In mesh shaders, one workgroup processes exactly one meshlet.
52+
53+
**Draw Indirect**
54+
55+
A rendering technique where draw commands (vertex count, instance count, offsets) are read from GPU buffers rather than CPU-provided parameters, enabling GPU-driven culling without CPU synchronization.
56+
Draw indirect is also available for task/mesh dispatches through `vkCmdDrawMeshTasksIndirectEXT` and `vkCmdDrawMeshTasksIndirectCountEXT`.
57+
58+
**Cone Culling / Meshlet Backface Culling**
59+
60+
Conservative culling of meshlets whose cone normal indicates all contained triangles face away from the camera.
61+
Not to be confused with traditional per-triangle backface culling in the rasterizer, this operates at meshlet granularity in the task shaders to avoid processing invisible geometry entirely.
62+
63+
64+
# Data Preparation
65+
66+
# Basic Mesh Shader Pipeline
67+
68+
# Adding Task Shaders
69+
70+
# Culling
71+
72+
# Indirect Integration
73+
74+
# Conclusion
75+
76+
<!-- Markdeep: --><style class="fallback">body{visibility:hidden;white-space:pre;font-family:monospace}</style><script src="markdeep.min.js" charset="utf-8"></script><script src="https://morgan3d.github.io/markdeep/latest/markdeep.min.js?" charset="utf-8"></script><script>window.alreadyProcessedMarkdeep||(document.body.style.visibility="visible")</script>

0 commit comments

Comments
 (0)