rstrace is a Rust implementation of strace for x86 Linux. It allows the user to trace system calls of a process or command.
Unlike strace it can introspect NVIDIA CUDA system calls.
Note
Currently only x86 Linux is supported. aarch64 support is planned, but MacOS support is out-of-scope.
curl -LsSf https://rstrace.xyz/install.sh | shcargo install rstrace
rstrace ls /tmp/To get a quick overview, use rstrace --help
Usage: rstrace [OPTIONS] [ARGS]...
Arguments:
  [ARGS]...  Arguments for the program to trace. e.g. 'ls /tmp/'
Options:
  -o, --output <OUTPUT>  send trace output to FILE instead of stderr
  -t, --timestamp...     Print absolute timestamp. -tt includes microseconds, -ttt uses UNIX timestamps
  -c, --summary-only     Count time, calls, and errors for each syscall and report summary
  -C, --summary          like -c, but also print the regular output
  -j, --summary-json     Count time, calls, and errors for each syscall and report summary in JSON format
      --tef              Emit Trace Event Format (TEF) trace data as output
      --verbose          Output human readable information about CUDA ioctls.
      --cuda             Enable CUDA ioctl sniffing. [Requires 'cuda_sniff' feature]
      --cuda-only        Enable CUDA ioctl sniffing and disable all other output. [Requires 'cuda_sniff' feature]
  -p, --attach <PID>     Attach to the process with the process ID pid and begin tracing.
  -f, --follow-forks     Trace child processes as they are created by currently traced processes as a result of the fork(2), vfork(2) and clone(2) system calls.
      --color            Enable colored output (default)
  -h, --help             Print help
  -V, --version          Print versioncuda_sniff is an extension to rstrace that allows the user to trace CUDA API calls. It is based on
https://github.com/geohot/cuda_ioctl_sniffer by George Hotz.
gvisor has an alternative implementation called ioct_sniffer which uses LD_PRELOAD to intercept calls,
unlike rstrace which uses ptrace.
rstrace can emit basic tracing data which can be loaded into the Google Perfetto trace viewer.
e.g. rstrace --output in-kernel.tef.json --tef python userspace.py
| Syscall dominant | Userspace dominant | 
|---|---|
|  |  | 
Syscall heavy code snippet
import os, time
deadline = time.monotonic() + 5
fd = os.open("tmp.bin", os.O_CREAT | os.O_RDWR, 0o600)
buf = b"\0" * 4096
while time.monotonic() < deadline:
    os.pwrite(fd, buf, 0)
    os.fsync(fd)
os.close(fd)
Userspace code snippet
use std::time::{Duration, Instant};
fn main() {
    let start = Instant::now();
    let mut x: u64 = 0;
    while start.elapsed() < Duration::from_secs(5) {
        for _ in 0..1_000_000 {
            // LCG-style math to burn CPU; wrapping avoids UB
            x = x.wrapping_mul(1664525).wrapping_add(1013904223);
        }
    }
    println!("{}", x); // prevents optimizing away the loop
}
- 
https://github.com/bpftrace/bpftrace - Don't use rstrace/strace in production. Where feasible use bpftracewhich is a more powerful tracing tool with much lower overhead.
 
- Don't use rstrace/strace in production. Where feasible use 
