Fuzzing: Consider adding R as a target to Google's OSS-Fuzz

Google maintains a GitHub repository, where open source projects can contribute tools that can be used to fuzz their project on Google infrastructure.

https://github.com/google/oss-fuzz

Claude-generated proposal below.

---

# Proposal: Adding R as a Fuzz Target in OSS-Fuzz

## Background

[OSS-Fuzz](https://github.com/google/oss-fuzz) is Google's continuous fuzzing
infrastructure for open source software. It currently supports C/C++, Go, Java,
JavaScript, Python, Rust, Swift, and Ruby, covering over 1,300 projects. R is
not yet among them.

This document proposes adding R as a fuzz target, outlines the approach, and
identifies candidate attack surfaces for initial fuzz targets.

## How OSS-Fuzz Works

Each project in oss-fuzz provides three files:

| File | Purpose |
|---|---|
| `project.yaml` | Metadata: language, contacts, sanitizers, fuzzing engines |
| `Dockerfile` | Build environment: base image, dependencies, source checkout |
| `build.sh` | Compilation script: builds fuzz target binaries, places them in `$OUT` |

### The contract

The build script receives environment variables (`$CC`, `$CXX`, `$CFLAGS`,
`$CXXFLAGS`, `$LIB_FUZZING_ENGINE`, `$SANITIZER`, `$OUT`) and must produce
standalone executables in the `$OUT` directory. Each executable must export the
`LLVMFuzzerTestOneInput` symbol -- that is how oss-fuzz discovers fuzz targets.

After the build completes:

1. The infrastructure scans `$OUT` for executables containing
   `LLVMFuzzerTestOneInput`.
2. The contents of `$OUT` are archived and uploaded to Google Cloud Storage.
   Everything else (source, build intermediates) is discarded.
3. [ClusterFuzz](https://google.github.io/clusterfuzz/) downloads the archive
   into a fresh runner container and continuously fuzzes each target using the
   configured engines (libfuzzer, AFL++, honggfuzz) and sanitizers (ASan, MSan,
   UBSan).

Supporting files can be placed alongside each target:

- `<target>_seed_corpus.zip` -- initial inputs to bootstrap the fuzzer
- `<target>.dict` -- fuzzing dictionary of interesting tokens
- `<target>.options` -- runtime configuration (e.g., memory limits)

## Precedent: How CPython and Ruby Are Fuzzed

R is a C-based interpreter, so the most relevant precedents are CPython and
Ruby -- both of which are fuzzed by compiling the interpreter from source with
sanitizers and linking C/C++ fuzz harnesses against it.

### CPython (`projects/cpython3/`)

- Declares `language: c++` in `project.yaml` (for libfuzzer linkage).
- Builds CPython from source with `--with-address-sanitizer` etc.
- Fuzz harnesses live upstream in CPython's source tree
  (`Modules/_xxtestfuzz/fuzzer.c`). A single C file uses preprocessor macros
  (`-D _Py_FUZZ_ONE -D _Py_FUZZ_<target>`) to select which target to compile.
- The build script iterates over `fuzz_tests.txt`, compiles one binary per
  target, and drops them into `$OUT`.
- Supports ASan, MSan, UBSan, and all three fuzzing engines.

### Ruby (`projects/ruby/`)

- Also declares `language: c++`.
- Downloads a stable Ruby release as a "baseruby" (built without sanitizers),
  then builds the target Ruby from source with `--enable-static` and sanitizers.
- Fuzz harnesses are C++ files in the oss-fuzz project directory (e.g.,
  `fuzz_regex.cpp`, `fuzz_ruby_parser.cpp`), compiled and linked against
  `libruby-static.a`.
- Supports ASan, UBSan, and all three engines.

Both projects require no changes to oss-fuzz infrastructure -- they use the
standard `base-builder` image and declare themselves as C++ projects.

## Proposed Approach for R

Follow the same pattern: build R from source with sanitizers, write C fuzz
harnesses, and link them against R's static library. No oss-fuzz infrastructure
changes are required.

### Project structure

```
projects/r/
  project.yaml
  Dockerfile
  build.sh
  fuzz_parse.c        # example fuzz target
  fuzz_serialize.c    # example fuzz target
  ...
  r.options           # shared fuzzer options (e.g., max_len, timeout)
```

### `project.yaml`

```yaml
homepage: "https://www.r-project.org/"
language: c++
primary_contact: "<contact>@<domain>"
auto_ccs:
  - "<cc1>@<domain>"
sanitizers:
  - address
  - undefined
fuzzing_engines:
  - libfuzzer
  - afl
  - honggfuzz
main_repo: "https://svn.r-project.org/R/trunk/"
```

### `Dockerfile` (sketch)

```dockerfile
FROM gcr.io/oss-fuzz-base/base-builder

RUN apt-get update && apt-get install -y \
    gfortran \
    libreadline-dev \
    libx11-dev \
    libxt-dev \
    libcurl4-openssl-dev \
    libbz2-dev \
    liblzma-dev \
    libpcre2-dev \
    zlib1g-dev \
    libjpeg-dev \
    libpng-dev \
    libtiff-dev \
    libcairo2-dev \
    texinfo

# Clone R source
RUN svn checkout https://svn.r-project.org/R/trunk/ $SRC/r-source \
    --non-interactive --trust-server-cert-failures=unknown-ca

COPY *.sh *.c *.options $SRC/
```

### `build.sh` (sketch)

```bash
#!/bin/bash -eu

export ASAN_OPTIONS="detect_leaks=0"

cd $SRC/r-source

# Configure R with sanitizers, static linking
./configure \
    --prefix="$WORK/r-install" \
    --disable-shared \
    --enable-static \
    --without-recommended-packages \
    --with-x=no \
    CC="$CC" \
    CXX="$CXX" \
    CFLAGS="$CFLAGS" \
    CXXFLAGS="$CXXFLAGS"

make -j$(nproc)

R_BUILD_DIR="$SRC/r-source"
INC_R="-I${R_BUILD_DIR}/include"
LIBS_R="${R_BUILD_DIR}/lib/libR.a"  # path may vary

# Build each fuzz target
for fuzzer_src in $SRC/fuzz_*.c; do
    fuzzer=$(basename "$fuzzer_src" .c)

    $CC $CFLAGS $INC_R -c "$fuzzer_src" -o "$WORK/${fuzzer}.o"
    $CXX $CXXFLAGS $LIB_FUZZING_ENGINE "$WORK/${fuzzer}.o" \
        $LIBS_R -lm -lpthread -ldl -lreadline -lpcre2-8 -llzma -lbz2 -lz \
        -o "$OUT/${fuzzer}"

    if [ -f "$SRC/${fuzzer}.options" ]; then
        cp "$SRC/${fuzzer}.options" "$OUT/${fuzzer}.options"
    fi
done
```

### Example fuzz target: `fuzz_parse.c` (sketch)

```c
#include <Rinternals.h>
#include <R.h>
#include <Rembedded.h>
#include <stdint.h>
#include <string.h>

/*
 * LLVMFuzzerInitialize is called once by the fuzzing engine before the
 * loop begins. We use it to bootstrap the embedded R session -- this is
 * expensive and only needs to happen once.
 */
int LLVMFuzzerInitialize(int *argc, char ***argv) {
    char *r_argv[] = {"R", "--no-save", "--no-restore", "--silent"};
    Rf_initEmbeddedR(4, r_argv);
    return 0;
}

/*
 * R signals errors (including parse errors) via longjmp, which would
 * crash the fuzzer process. We wrap all R calls inside R_ToplevelExec,
 * which sets up a protected context and returns FALSE if R longjmps.
 */
static void do_parse(void *data) {
    const char *buf = (const char *)data;
    SEXP str = PROTECT(Rf_mkString(buf));
    ParseStatus status;
    SEXP parsed = R_ParseVector(str, -1, &status, R_NilValue);
    UNPROTECT(1);
    (void)parsed;
}

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    /* Null-terminate the input for R's parser */
    char *buf = (char *)malloc(size + 1);
    if (!buf) return 0;
    memcpy(buf, data, size);
    buf[size] = '\0';

    R_ToplevelExec(do_parse, (void *)buf);

    free(buf);
    return 0;
}
```

## Candidate Fuzz Targets

These are R's C-level entry points most likely to benefit from fuzzing:

| Target | R API | What it exercises |
|---|---|---|
| Parser | `R_ParseVector` | R source code parsing |
| Serialization | `R_unserialize` | Deserialization of `.rds`/`.RData` files |
| Regex | `R_nchar`, PCRE2 wrappers | Pattern matching in `grep`, `gsub` |
| String encoding | `Rf_translateCharUTF8`, `mkCharLenCE` | Encoding conversions |
| Connection I/O | `R_ReadConnection` | Reading from arbitrary input streams |
| Numeric coercion | `Rf_coerceVector` | Type coercion (character -> numeric, etc.) |

Starting with `R_ParseVector` and `R_unserialize` would cover two of the
highest-value surfaces: parsing untrusted R code and deserializing untrusted
binary data.

## Open Questions

1. **Static linking**: R's build system may not produce a static `libR.a` by
   default. This needs investigation -- we may need `--enable-static` or may
   need to build and link against individual object files. The Ruby project
   faced a similar challenge and solved it with `--with-static-linked-ext`.

2. **Fortran dependencies**: R's core uses Fortran (BLAS/LAPACK). We need to
   confirm that `gfortran`'s runtime (`libgfortran`) works correctly under
   ASan/UBSan. If not, we may need to use a reference BLAS/LAPACK written in C.

3. **Embedded R initialization**: `Rf_initEmbeddedR` does significant setup.
   We use `LLVMFuzzerInitialize` (called once by the engine before the fuzzing
   loop) to pay this cost up front. We need to ensure this is compatible with
   the sanitizer environment and doesn't produce excessive false positives.

4. **Upstream vs. oss-fuzz**: Should fuzz harnesses live upstream in R's source
   tree (like CPython) or in the oss-fuzz project directory (like Ruby)? Hosting
   them upstream makes maintenance easier long-term but requires buy-in from
   R-core.

5. **longjmp error handling**: R signals errors (including parse errors) via
   `longjmp`, which would crash the fuzzer process if uncaught. All fuzz
   targets must wrap R calls in `R_ToplevelExec`, which sets up a protected
   context and returns `FALSE` on error instead of longjmping. This is
   already reflected in the example harnesses above.

6. **MSan support**: MemorySanitizer requires all linked code to be
   instrumented. Given R's Fortran dependencies, MSan may not be feasible
   initially. CPython and Ruby both support MSan, but R's dependency chain is
   more complex.

## Next Steps

1. Prototype the `Dockerfile` and `build.sh` locally using oss-fuzz's
   `helper.py` (`python infra/helper.py build_fuzzers r`).
2. Get `fuzz_parse` and `fuzz_serialize` working under ASan + libfuzzer.
3. Submit a PR to oss-fuzz and iterate with Google's oss-fuzz team on review.
4. Expand fuzz targets based on initial findings.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzing: Consider adding R as a target to Google's OSS-Fuzz #53

Proposal: Adding R as a Fuzz Target in OSS-Fuzz

Background

How OSS-Fuzz Works

The contract

Precedent: How CPython and Ruby Are Fuzzed

CPython (`projects/cpython3/`)

Ruby (`projects/ruby/`)

Proposed Approach for R

Project structure

`project.yaml`

`Dockerfile` (sketch)

`build.sh` (sketch)

Example fuzz target: `fuzz_parse.c` (sketch)

Candidate Fuzz Targets

Open Questions

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Purpose
`project.yaml`	Metadata: language, contacts, sanitizers, fuzzing engines
`Dockerfile`	Build environment: base image, dependencies, source checkout
`build.sh`	Compilation script: builds fuzz target binaries, places them in `$OUT`

Target	R API	What it exercises
Parser	`R_ParseVector`	R source code parsing
Serialization	`R_unserialize`	Deserialization of `.rds`/`.RData` files
Regex	`R_nchar`, PCRE2 wrappers	Pattern matching in `grep`, `gsub`
String encoding	`Rf_translateCharUTF8`, `mkCharLenCE`	Encoding conversions
Connection I/O	`R_ReadConnection`	Reading from arbitrary input streams
Numeric coercion	`Rf_coerceVector`	Type coercion (character -> numeric, etc.)

Fuzzing: Consider adding R as a target to Google's OSS-Fuzz #53

Description

Proposal: Adding R as a Fuzz Target in OSS-Fuzz

Background

How OSS-Fuzz Works

The contract

Precedent: How CPython and Ruby Are Fuzzed

CPython (projects/cpython3/)

Ruby (projects/ruby/)

Proposed Approach for R

Project structure

project.yaml

Dockerfile (sketch)

build.sh (sketch)

Example fuzz target: fuzz_parse.c (sketch)

Candidate Fuzz Targets

Open Questions

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

CPython (`projects/cpython3/`)

Ruby (`projects/ruby/`)

`project.yaml`

`Dockerfile` (sketch)

`build.sh` (sketch)

Example fuzz target: `fuzz_parse.c` (sketch)