Skip to content

Commit 8d6bbfb

Browse files
authored
AI Protocol Parsing (part 1): Introduce Wuffs based Json parser (#45641)
Design Doc https://docs.google.com/document/d/1V08HmDIJW6a0QGyqjQQhKW7yLEsT82cmf4an05_yUdk/edit?tab=t.0#heading=h.7r0qptd7eill Introduces WuffsJsonCursor, a SAX-style JSON parser built on[ Wuffs](https://github.com/google/wuffs). First PR e implements all Wuffs internals — token buffer management, coroutine suspension/resumption, escape decoding. and exposes a Handler callback interface <img width="1848" height="917" alt="image" src="https://github.com/user-attachments/assets/6d1f9e12-3b0c-4bca-a94f-989ea081b724" /> This PR implements the JSON tokenizer and Token Dispatch, the Handler will be implemented in the next PR --------- Signed-off-by: tyxia <tyxia@google.com>
1 parent 907f4df commit 8d6bbfb

12 files changed

Lines changed: 1611 additions & 0 deletions

File tree

bazel/deps.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -557,6 +557,16 @@ wasmtime:
557557
license: "Apache-2.0"
558558
license_url: "https://github.com/bytecodealliance/wasmtime/blob/v{version}/LICENSE"
559559

560+
wuffs:
561+
project_name: "Wuffs"
562+
project_desc: "Memory-safe, high-performance JSON (and other format) parser written in a memory-safe subset of C"
563+
project_url: "https://github.com/google/wuffs-mirror-release-c"
564+
release_date: "2024-09-14"
565+
use_category:
566+
- dataplane_ext
567+
license: "Apache-2.0"
568+
license_url: "https://github.com/google/wuffs-mirror-release-c/blob/v{version}/LICENSE"
569+
560570
abseil_cpp:
561571
project_name: "Abseil"
562572
project_desc: "Open source collection of C++ libraries drawn from the most fundamental pieces of Google’s internal codebase"

bazel/repositories.bzl

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,7 @@ def envoy_dependencies(skip_targets = []):
242242

243243
_libmaxminddb()
244244
_thrift()
245+
_wuffs()
245246

246247
external_http_archive("rules_license")
247248
external_http_archive("rules_pkg")
@@ -1072,3 +1073,21 @@ def _libmaxminddb():
10721073
name = "libmaxminddb",
10731074
build_file_content = LIBMAXMINDDB_BUILD_CONTENT,
10741075
)
1076+
1077+
def _wuffs():
1078+
external_http_archive(
1079+
name = "wuffs",
1080+
build_file_content = """
1081+
cc_library(
1082+
name = "wuffs",
1083+
# Wuffs uses an amalgamated single-file distribution: wuffs-v0.4.c acts as
1084+
# a header (declarations only) when included without WUFFS_IMPLEMENTATION,
1085+
# and as a full implementation when WUFFS_IMPLEMENTATION is defined (done
1086+
# in exactly one TU: wuffs_impl.c). Listed as hdrs so dependent targets
1087+
# may include it.
1088+
textual_hdrs = ["release/c/wuffs-v0.4.c"],
1089+
visibility = ["//visibility:public"],
1090+
copts = ["-Wno-unused-function"],
1091+
)
1092+
""",
1093+
)

bazel/repository_locations.bzl

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -749,6 +749,15 @@ REPOSITORY_LOCATIONS_SPEC = dict(
749749
strip_prefix = "cmake-{version}",
750750
urls = ["https://github.com/Kitware/CMake/releases/download/v{version}/cmake-{version}.tar.gz"],
751751
),
752+
wuffs = dict(
753+
version = "0.4.0-alpha.9",
754+
sha256 = "9ca4f5401a76be244362de8b39138f01f2456c444b03584703a9f1db90491ba6",
755+
strip_prefix = "wuffs-mirror-release-c-{version}",
756+
urls = ["https://github.com/google/wuffs-mirror-release-c/archive/refs/tags/v{version}.tar.gz"],
757+
# Wuffs: memory-safe, high-performance JSON (and other format) parser.
758+
# The amalgamated C file at release/c/wuffs-v0.4.c is both the header
759+
# (declarations) and implementation (when WUFFS_IMPLEMENTATION is defined).
760+
),
752761
)
753762

754763
def _compiled_protoc_deps(locations, versions):
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Added a Wuffs-backed streaming JSON parser for AI protocol parsing (MCP, A2A, OpenAI, Anthropic, etc.) with the following properties:
2+
incremental chunk-by-chunk parsing that resumes across arbitrary HTTP body chunk boundaries without buffering the full body;
3+
token-based processing with no DOM allocation — heap usage is proportional only to fields the handler opts into capturing, not to the total body size;
4+
AI-native field extraction that inlines scalar fields (e.g. ``model``, ``method``, ``id``, ``params.name``) into small bounded strings and captures large fields (e.g. ``messages[]``, ``params.arguments``) as zero-copy byte-range references into the original body;
5+
and duplicate-key detection that rejects key-smuggling attacks (e.g. multiple ``"model"`` fields) with early mid-stream termination before the full body is consumed.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
load("@rules_cc//cc:cc_library.bzl", "cc_library")
2+
load(
3+
"//bazel:envoy_build_system.bzl",
4+
"envoy_cc_library",
5+
"envoy_package",
6+
)
7+
8+
licenses(["notice"]) # Apache 2
9+
10+
envoy_package()
11+
12+
# wuffs_impl.c is a pure C translation unit (the Wuffs amalgamation).
13+
# It must NOT use envoy_cc_library because envoy_copts() adds C++-only flags
14+
# (-Woverloaded-virtual, -Wold-style-cast) that GCC rejects for C source files.
15+
cc_library(
16+
name = "wuffs_impl",
17+
srcs = ["wuffs_impl.c"],
18+
copts = ["-Wno-unused-function"],
19+
linkstatic = True,
20+
deps = ["@wuffs"],
21+
)
22+
23+
envoy_cc_library(
24+
name = "wuffs_json_cursor_lib",
25+
srcs = ["wuffs_json_cursor.cc"],
26+
hdrs = ["wuffs_json_cursor.h"],
27+
deps = [
28+
":wuffs_impl",
29+
"@abseil-cpp//absl/base:nullability",
30+
"@abseil-cpp//absl/container:flat_hash_set",
31+
"@abseil-cpp//absl/status",
32+
"@abseil-cpp//absl/strings",
33+
"@wuffs",
34+
],
35+
)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
// This file is the single compilation unit that provides the Wuffs library
2+
// implementation. Every other file that uses Wuffs includes wuffs-v0.4.c
3+
// without WUFFS_IMPLEMENTATION (declarations only).
4+
//
5+
// WUFFS_IMPLEMENTATION must be defined in exactly one translation unit.
6+
#define WUFFS_IMPLEMENTATION
7+
#include "release/c/wuffs-v0.4.c"

0 commit comments

Comments
 (0)