You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**_Task and Mesh Shaders: A Practical Guide (Slang)_**
2
+
[William Gunawan](https://www.williscool.com)
3
+
Written on 2025/12/04
4
+
5
+
# Introduction
6
+
7
+
Mesh shaders represent a fundamental shift in GPU rendering pipelines.
8
+
Unlike traditional vertex shaders that process vertices individually, mesh shaders adopt a compute-like programming model with explicit thread dispatch and shared memory access.
9
+
10
+
This article will demonstrate a practical task/mesh shader implementation in Vulkan with Slang, including:
11
+
- Basic mesh shader pipelines (with and without task shaders)
12
+
- GPU-driven frustum and meshlet backface culling
13
+
- Integration with indirect draw workflows
14
+
15
+
For a comprehensive explanation of the mesh shader model, refer to NVIDIA's [Introduction to Mesh Shaders](https://developer.nvidia.com/blog/introduction-turing-mesh-shaders/) and AMD's [Mesh Shader Guide](https://gpuopen.com/learn/mesh_shaders/mesh_shaders-from_vertex_shader_to_mesh_shader/).
16
+
17
+
I also provide a small benchmark where I compare the performance of traditional rendering against task and mesh shaders.
18
+
The most notable finding is that task and mesh shaders provide benefits in 2 ways: Improved cache hit rate during the draw calls and better culling granularity - reducing the number of vertices rasterized outright.
19
+
It is available on [my website](../../technical/task-mesh-benchmarking/task-mesh-benchmarking.md.html).
20
+
21
+
## Terminology
22
+
23
+
**Task Shader / Amplification Shader**
24
+
25
+
An optional pre-processing stage that determines which mesh shader workgroups to spawn. Performs coarse culling (e.g., per-meshlet frustum culling) before mesh shading.
26
+
Typically dispatched with 32-128 threads per workgroup to evaluate multiple meshlets in parallel.
27
+
While all threads can `DispatchMesh` for mesh shader workgroups, only one needs to do it after a group shared sync.
28
+
Called "Amplification Shader" in DirectX 12.
29
+
30
+
**Mesh Shader**
31
+
32
+
Generates primitives and vertices for rasterization.
33
+
Replaces the traditional vertex/geometry shader stages.
34
+
Outputs a variable number of triangles per workgroup (up to hardware limits, varies by vendor but typically 256 vertices/256 triangles).
35
+
Though mesh shaders are not limited to triangles (you can output other primitives), triangles will be the focus of this article.
36
+
37
+
**Meshlet**
38
+
39
+
A small cluster of vertices and triangles, typically 32-64 vertices and 64-124 triangles. See why in the tips section of this [article](https://developer.nvidia.com/blog/using-mesh-shaders-for-professional-graphics/).
40
+
Meshlets are the atomic unit processed by mesh shaders, designed to fit within GPU shared memory and optimize cache locality.
41
+
42
+
**Thread / Invocation**
43
+
44
+
A single execution instance within a thread group.
45
+
Threads within a thread group can cooperate via shared memory and barriers.
46
+
47
+
**Thread Group / Workgroup**
48
+
49
+
A collection of threads dispatched together, sharing local memory and synchronization primitives.
50
+
In task shaders, one workgroup typically evaluates multiple meshlets (often one per thread) and emits mesh shader workgroups for visible ones.
51
+
In mesh shaders, one workgroup processes exactly one meshlet.
52
+
53
+
**Draw Indirect**
54
+
55
+
A rendering technique where draw commands (vertex count, instance count, offsets) are read from GPU buffers rather than CPU-provided parameters, enabling GPU-driven culling without CPU synchronization.
56
+
Draw indirect is also available for task/mesh dispatches through `vkCmdDrawMeshTasksIndirectEXT` and `vkCmdDrawMeshTasksIndirectCountEXT`.
57
+
58
+
**Cone Culling / Meshlet Backface Culling**
59
+
60
+
Conservative culling of meshlets whose cone normal indicates all contained triangles face away from the camera.
61
+
Not to be confused with traditional per-triangle backface culling in the rasterizer, this operates at meshlet granularity in the task shaders to avoid processing invisible geometry entirely.
0 commit comments