Google maintains a GitHub repository, where open source projects can contribute tools that can be used to fuzz their project on Google infrastructure.
https://github.com/google/oss-fuzz
Claude-generated proposal below.
Proposal: Adding R as a Fuzz Target in OSS-Fuzz
Background
OSS-Fuzz is Google's continuous fuzzing
infrastructure for open source software. It currently supports C/C++, Go, Java,
JavaScript, Python, Rust, Swift, and Ruby, covering over 1,300 projects. R is
not yet among them.
This document proposes adding R as a fuzz target, outlines the approach, and
identifies candidate attack surfaces for initial fuzz targets.
How OSS-Fuzz Works
Each project in oss-fuzz provides three files:
| File |
Purpose |
project.yaml |
Metadata: language, contacts, sanitizers, fuzzing engines |
Dockerfile |
Build environment: base image, dependencies, source checkout |
build.sh |
Compilation script: builds fuzz target binaries, places them in $OUT |
The contract
The build script receives environment variables ($CC, $CXX, $CFLAGS,
$CXXFLAGS, $LIB_FUZZING_ENGINE, $SANITIZER, $OUT) and must produce
standalone executables in the $OUT directory. Each executable must export the
LLVMFuzzerTestOneInput symbol -- that is how oss-fuzz discovers fuzz targets.
After the build completes:
- The infrastructure scans
$OUT for executables containing
LLVMFuzzerTestOneInput.
- The contents of
$OUT are archived and uploaded to Google Cloud Storage.
Everything else (source, build intermediates) is discarded.
- ClusterFuzz downloads the archive
into a fresh runner container and continuously fuzzes each target using the
configured engines (libfuzzer, AFL++, honggfuzz) and sanitizers (ASan, MSan,
UBSan).
Supporting files can be placed alongside each target:
<target>_seed_corpus.zip -- initial inputs to bootstrap the fuzzer
<target>.dict -- fuzzing dictionary of interesting tokens
<target>.options -- runtime configuration (e.g., memory limits)
Precedent: How CPython and Ruby Are Fuzzed
R is a C-based interpreter, so the most relevant precedents are CPython and
Ruby -- both of which are fuzzed by compiling the interpreter from source with
sanitizers and linking C/C++ fuzz harnesses against it.
CPython (projects/cpython3/)
- Declares
language: c++ in project.yaml (for libfuzzer linkage).
- Builds CPython from source with
--with-address-sanitizer etc.
- Fuzz harnesses live upstream in CPython's source tree
(Modules/_xxtestfuzz/fuzzer.c). A single C file uses preprocessor macros
(-D _Py_FUZZ_ONE -D _Py_FUZZ_<target>) to select which target to compile.
- The build script iterates over
fuzz_tests.txt, compiles one binary per
target, and drops them into $OUT.
- Supports ASan, MSan, UBSan, and all three fuzzing engines.
Ruby (projects/ruby/)
- Also declares
language: c++.
- Downloads a stable Ruby release as a "baseruby" (built without sanitizers),
then builds the target Ruby from source with --enable-static and sanitizers.
- Fuzz harnesses are C++ files in the oss-fuzz project directory (e.g.,
fuzz_regex.cpp, fuzz_ruby_parser.cpp), compiled and linked against
libruby-static.a.
- Supports ASan, UBSan, and all three engines.
Both projects require no changes to oss-fuzz infrastructure -- they use the
standard base-builder image and declare themselves as C++ projects.
Proposed Approach for R
Follow the same pattern: build R from source with sanitizers, write C fuzz
harnesses, and link them against R's static library. No oss-fuzz infrastructure
changes are required.
Project structure
projects/r/
project.yaml
Dockerfile
build.sh
fuzz_parse.c # example fuzz target
fuzz_serialize.c # example fuzz target
...
r.options # shared fuzzer options (e.g., max_len, timeout)
project.yaml
homepage: "https://www.r-project.org/"
language: c++
primary_contact: "<contact>@<domain>"
auto_ccs:
- "<cc1>@<domain>"
sanitizers:
- address
- undefined
fuzzing_engines:
- libfuzzer
- afl
- honggfuzz
main_repo: "https://svn.r-project.org/R/trunk/"
Dockerfile (sketch)
FROM gcr.io/oss-fuzz-base/base-builder
RUN apt-get update && apt-get install -y \
gfortran \
libreadline-dev \
libx11-dev \
libxt-dev \
libcurl4-openssl-dev \
libbz2-dev \
liblzma-dev \
libpcre2-dev \
zlib1g-dev \
libjpeg-dev \
libpng-dev \
libtiff-dev \
libcairo2-dev \
texinfo
# Clone R source
RUN svn checkout https://svn.r-project.org/R/trunk/ $SRC/r-source \
--non-interactive --trust-server-cert-failures=unknown-ca
COPY *.sh *.c *.options $SRC/
build.sh (sketch)
#!/bin/bash -eu
export ASAN_OPTIONS="detect_leaks=0"
cd $SRC/r-source
# Configure R with sanitizers, static linking
./configure \
--prefix="$WORK/r-install" \
--disable-shared \
--enable-static \
--without-recommended-packages \
--with-x=no \
CC="$CC" \
CXX="$CXX" \
CFLAGS="$CFLAGS" \
CXXFLAGS="$CXXFLAGS"
make -j$(nproc)
R_BUILD_DIR="$SRC/r-source"
INC_R="-I${R_BUILD_DIR}/include"
LIBS_R="${R_BUILD_DIR}/lib/libR.a" # path may vary
# Build each fuzz target
for fuzzer_src in $SRC/fuzz_*.c; do
fuzzer=$(basename "$fuzzer_src" .c)
$CC $CFLAGS $INC_R -c "$fuzzer_src" -o "$WORK/${fuzzer}.o"
$CXX $CXXFLAGS $LIB_FUZZING_ENGINE "$WORK/${fuzzer}.o" \
$LIBS_R -lm -lpthread -ldl -lreadline -lpcre2-8 -llzma -lbz2 -lz \
-o "$OUT/${fuzzer}"
if [ -f "$SRC/${fuzzer}.options" ]; then
cp "$SRC/${fuzzer}.options" "$OUT/${fuzzer}.options"
fi
done
Example fuzz target: fuzz_parse.c (sketch)
#include <Rinternals.h>
#include <R.h>
#include <Rembedded.h>
#include <stdint.h>
#include <string.h>
/*
* LLVMFuzzerInitialize is called once by the fuzzing engine before the
* loop begins. We use it to bootstrap the embedded R session -- this is
* expensive and only needs to happen once.
*/
int LLVMFuzzerInitialize(int *argc, char ***argv) {
char *r_argv[] = {"R", "--no-save", "--no-restore", "--silent"};
Rf_initEmbeddedR(4, r_argv);
return 0;
}
/*
* R signals errors (including parse errors) via longjmp, which would
* crash the fuzzer process. We wrap all R calls inside R_ToplevelExec,
* which sets up a protected context and returns FALSE if R longjmps.
*/
static void do_parse(void *data) {
const char *buf = (const char *)data;
SEXP str = PROTECT(Rf_mkString(buf));
ParseStatus status;
SEXP parsed = R_ParseVector(str, -1, &status, R_NilValue);
UNPROTECT(1);
(void)parsed;
}
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
/* Null-terminate the input for R's parser */
char *buf = (char *)malloc(size + 1);
if (!buf) return 0;
memcpy(buf, data, size);
buf[size] = '\0';
R_ToplevelExec(do_parse, (void *)buf);
free(buf);
return 0;
}
Candidate Fuzz Targets
These are R's C-level entry points most likely to benefit from fuzzing:
| Target |
R API |
What it exercises |
| Parser |
R_ParseVector |
R source code parsing |
| Serialization |
R_unserialize |
Deserialization of .rds/.RData files |
| Regex |
R_nchar, PCRE2 wrappers |
Pattern matching in grep, gsub |
| String encoding |
Rf_translateCharUTF8, mkCharLenCE |
Encoding conversions |
| Connection I/O |
R_ReadConnection |
Reading from arbitrary input streams |
| Numeric coercion |
Rf_coerceVector |
Type coercion (character -> numeric, etc.) |
Starting with R_ParseVector and R_unserialize would cover two of the
highest-value surfaces: parsing untrusted R code and deserializing untrusted
binary data.
Open Questions
-
Static linking: R's build system may not produce a static libR.a by
default. This needs investigation -- we may need --enable-static or may
need to build and link against individual object files. The Ruby project
faced a similar challenge and solved it with --with-static-linked-ext.
-
Fortran dependencies: R's core uses Fortran (BLAS/LAPACK). We need to
confirm that gfortran's runtime (libgfortran) works correctly under
ASan/UBSan. If not, we may need to use a reference BLAS/LAPACK written in C.
-
Embedded R initialization: Rf_initEmbeddedR does significant setup.
We use LLVMFuzzerInitialize (called once by the engine before the fuzzing
loop) to pay this cost up front. We need to ensure this is compatible with
the sanitizer environment and doesn't produce excessive false positives.
-
Upstream vs. oss-fuzz: Should fuzz harnesses live upstream in R's source
tree (like CPython) or in the oss-fuzz project directory (like Ruby)? Hosting
them upstream makes maintenance easier long-term but requires buy-in from
R-core.
-
longjmp error handling: R signals errors (including parse errors) via
longjmp, which would crash the fuzzer process if uncaught. All fuzz
targets must wrap R calls in R_ToplevelExec, which sets up a protected
context and returns FALSE on error instead of longjmping. This is
already reflected in the example harnesses above.
-
MSan support: MemorySanitizer requires all linked code to be
instrumented. Given R's Fortran dependencies, MSan may not be feasible
initially. CPython and Ruby both support MSan, but R's dependency chain is
more complex.
Next Steps
- Prototype the
Dockerfile and build.sh locally using oss-fuzz's
helper.py (python infra/helper.py build_fuzzers r).
- Get
fuzz_parse and fuzz_serialize working under ASan + libfuzzer.
- Submit a PR to oss-fuzz and iterate with Google's oss-fuzz team on review.
- Expand fuzz targets based on initial findings.
Google maintains a GitHub repository, where open source projects can contribute tools that can be used to fuzz their project on Google infrastructure.
https://github.com/google/oss-fuzz
Claude-generated proposal below.
Proposal: Adding R as a Fuzz Target in OSS-Fuzz
Background
OSS-Fuzz is Google's continuous fuzzing
infrastructure for open source software. It currently supports C/C++, Go, Java,
JavaScript, Python, Rust, Swift, and Ruby, covering over 1,300 projects. R is
not yet among them.
This document proposes adding R as a fuzz target, outlines the approach, and
identifies candidate attack surfaces for initial fuzz targets.
How OSS-Fuzz Works
Each project in oss-fuzz provides three files:
project.yamlDockerfilebuild.sh$OUTThe contract
The build script receives environment variables (
$CC,$CXX,$CFLAGS,$CXXFLAGS,$LIB_FUZZING_ENGINE,$SANITIZER,$OUT) and must producestandalone executables in the
$OUTdirectory. Each executable must export theLLVMFuzzerTestOneInputsymbol -- that is how oss-fuzz discovers fuzz targets.After the build completes:
$OUTfor executables containingLLVMFuzzerTestOneInput.$OUTare archived and uploaded to Google Cloud Storage.Everything else (source, build intermediates) is discarded.
into a fresh runner container and continuously fuzzes each target using the
configured engines (libfuzzer, AFL++, honggfuzz) and sanitizers (ASan, MSan,
UBSan).
Supporting files can be placed alongside each target:
<target>_seed_corpus.zip-- initial inputs to bootstrap the fuzzer<target>.dict-- fuzzing dictionary of interesting tokens<target>.options-- runtime configuration (e.g., memory limits)Precedent: How CPython and Ruby Are Fuzzed
R is a C-based interpreter, so the most relevant precedents are CPython and
Ruby -- both of which are fuzzed by compiling the interpreter from source with
sanitizers and linking C/C++ fuzz harnesses against it.
CPython (
projects/cpython3/)language: c++inproject.yaml(for libfuzzer linkage).--with-address-sanitizeretc.(
Modules/_xxtestfuzz/fuzzer.c). A single C file uses preprocessor macros(
-D _Py_FUZZ_ONE -D _Py_FUZZ_<target>) to select which target to compile.fuzz_tests.txt, compiles one binary pertarget, and drops them into
$OUT.Ruby (
projects/ruby/)language: c++.then builds the target Ruby from source with
--enable-staticand sanitizers.fuzz_regex.cpp,fuzz_ruby_parser.cpp), compiled and linked againstlibruby-static.a.Both projects require no changes to oss-fuzz infrastructure -- they use the
standard
base-builderimage and declare themselves as C++ projects.Proposed Approach for R
Follow the same pattern: build R from source with sanitizers, write C fuzz
harnesses, and link them against R's static library. No oss-fuzz infrastructure
changes are required.
Project structure
project.yamlDockerfile(sketch)build.sh(sketch)Example fuzz target:
fuzz_parse.c(sketch)Candidate Fuzz Targets
These are R's C-level entry points most likely to benefit from fuzzing:
R_ParseVectorR_unserialize.rds/.RDatafilesR_nchar, PCRE2 wrappersgrep,gsubRf_translateCharUTF8,mkCharLenCER_ReadConnectionRf_coerceVectorStarting with
R_ParseVectorandR_unserializewould cover two of thehighest-value surfaces: parsing untrusted R code and deserializing untrusted
binary data.
Open Questions
Static linking: R's build system may not produce a static
libR.abydefault. This needs investigation -- we may need
--enable-staticor mayneed to build and link against individual object files. The Ruby project
faced a similar challenge and solved it with
--with-static-linked-ext.Fortran dependencies: R's core uses Fortran (BLAS/LAPACK). We need to
confirm that
gfortran's runtime (libgfortran) works correctly underASan/UBSan. If not, we may need to use a reference BLAS/LAPACK written in C.
Embedded R initialization:
Rf_initEmbeddedRdoes significant setup.We use
LLVMFuzzerInitialize(called once by the engine before the fuzzingloop) to pay this cost up front. We need to ensure this is compatible with
the sanitizer environment and doesn't produce excessive false positives.
Upstream vs. oss-fuzz: Should fuzz harnesses live upstream in R's source
tree (like CPython) or in the oss-fuzz project directory (like Ruby)? Hosting
them upstream makes maintenance easier long-term but requires buy-in from
R-core.
longjmp error handling: R signals errors (including parse errors) via
longjmp, which would crash the fuzzer process if uncaught. All fuzztargets must wrap R calls in
R_ToplevelExec, which sets up a protectedcontext and returns
FALSEon error instead of longjmping. This isalready reflected in the example harnesses above.
MSan support: MemorySanitizer requires all linked code to be
instrumented. Given R's Fortran dependencies, MSan may not be feasible
initially. CPython and Ruby both support MSan, but R's dependency chain is
more complex.
Next Steps
Dockerfileandbuild.shlocally using oss-fuzz'shelper.py(python infra/helper.py build_fuzzers r).fuzz_parseandfuzz_serializeworking under ASan + libfuzzer.