Skip to content

Commit 9af8f5b

Browse files
Refactor header gen (#28)
* Update headergen * Add changeset * Fix linting issue * Update docs * Updatmp dir issue * Update outputs for header gen * Update docs with file directory changes * Clarify primitive type table in c-proto-wire-mapping.md - Define W32/W64 as the --word-size flag (target platform ABI) - Note that plain char signedness is delegated to libclang at parse time - Clarify that fixed-size arrays use Vec for ergonomics but the decoder reads exactly N elements baked into the generated struct Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Handle failed disk writes * Update docs * Cargo fmt * Update docs * Fail loudly * Revert to c based offsets * Update reference doc * Update tests * Update to major * Update docs * Use i32:Max * Update dockerfile with stale reference * Add roundtrip tests * Add example headers * Update readme --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 6a27457 commit 9af8f5b

29 files changed

Lines changed: 4142 additions & 639 deletions
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
"@spear-ai/webway": major
3+
---
4+
5+
`header-gen` outputs now match `wsdl-gen` in contract and wire format (MOD-158).
6+
7+
**Breaking changes:**
8+
- Output file renamed `structs.rs``messages.rs`
9+
- `mappers.rs` and `--out-mapping` CLI flag removed
10+
- `decode_raw` / `encode_raw` now use **C struct ABI layout** (libclang field offsets, includes alignment padding) — the only correct interpretation for binary messages produced by C programs
11+
12+
**New capabilities:**
13+
- Generated structs carry `#[derive(prost::Message, Serialize, Deserialize)]`
14+
- Four codec methods on every struct: `decode_raw`, `decode` (with endianness prefix), `encode_raw`, `encoded_size`
15+
- C enums are now parsed and emitted as Rust `#[derive(prost::Enumeration)]` enums and proto3 `enum` blocks, following MOD-140 conventions (`{EnumName}Unspecified = 0`, sequential variants with `// c: N` comments, `{EnumName}Garbage = 2147483647`)
16+
- Proto output restructured to per-header shards at `types/headers/{stem}.proto` with `package spear.v1.{stem};` and cross-header imports (mirrors PR #24 for `wsdl-gen`)
17+
- `--out-proto` is now the proto tree root; shards are written under `types/headers/` within it
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
"@spear-ai/webway": patch
3+
---
4+
5+
`header-gen` now registers inline nested struct and enum definitions (MOD-164).
6+
7+
Previously, types defined inline inside a parent struct (e.g. `struct Moon { ... } moon;` or `enum FlightPhase { ... } phase;`) were silently omitted from the registry, causing the emitter to generate decode calls referencing types that didn't exist. Now these nested types are discovered when their parent struct's fields are traversed and are registered with the same input-directory filter applied to top-level declarations.
8+
9+
All three inline-nesting patterns are supported: single nested struct field, fixed-size array of nested structs, and nested enum field.

.cursor/rules/technical-decisions-mod-140.mdc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ Canonical source: [MOD-140 on Linear](https://linear.app/spear-ai/issue/MOD-140/
3030

3131
| | Template |
3232
| -- | -- |
33-
| Type package (xsd) | `spear.v1.{schema_filename}` |
34-
| Type package (headers) | `spear.v1.{header_filename}` |
35-
| Type file (xsd) | `proto/types/xsd/{schema_filename}.proto` |
36-
| Type file (headers) | `proto/types/headers/{header_filename}.proto` |
33+
| Type package (xsd) | `spear.v1.{schema_relative_stem}` (e.g. `spear.v1.net.types`) |
34+
| Type package (headers) | `spear.v1.{header_relative_stem}` (e.g. `spear.v1.net.messages`) |
35+
| Type file (xsd) | `proto/types/xsd/{schema_relative_stem}.proto` (e.g. `proto/types/xsd/net/types.proto`) |
36+
| Type file (headers) | `proto/types/headers/{header_relative_stem}.proto` (e.g. `proto/types/headers/net/messages.proto`) |
3737
| Message package | `spear.message.v1.{header_type}.{payload_type}` |
3838
| Message file | `proto/messages/{header_type}/{payload_type}.proto` |
3939
| Schema registry subject | `spear.message.v1.{header_type}.{payload_type}` |
@@ -53,7 +53,7 @@ Two directories under `proto/`:
5353

5454
**Package naming**
5555

56-
- Types: `spear.v1.{schema_filename}` — derived mechanically from source file name, no manual config.
56+
- Types: `spear.v1.{relative_stem}` — derived mechanically from the source file's path relative to the input directory (e.g. `net/messages.h` → `spear.v1.net.messages`). Directory structure is preserved so same-named files in different subdirectories produce distinct packages. No manual config.
5757
- Messages: `spear.message.v1.{header_type}.{payload_type}` — transport (SLEMR/DCM) is excluded from package names. `spear` is used as the root over effort-specific names for long-term stability.
5858

5959
**Message naming**
@@ -158,7 +158,7 @@ C code comparing against XSD integer values can migrate using the `// xsd: N` co
158158

159159
**Versioning**
160160

161-
Version lives in the package path only — `spear.v1.{schema_filename}` for types, `spear.message.v1.{header_type}.{payload_type}` for messages. No version in individual message or type names. A version bump happens at the `messages/` layer when consumers would need to change their code. Non-breaking changes (adding an optional field) do not require a version bump.
161+
Version lives in the package path only — `spear.v1.{relative_stem}` for types, `spear.message.v1.{header_type}.{payload_type}` for messages. No version in individual message or type names. A version bump happens at the `messages/` layer when consumers would need to change their code. Non-breaking changes (adding an optional field) do not require a version bump.
162162

163163
**Proto syntax**
164164

CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ cargo test -p spear-lib
3232
# XSD → proto + Rust
3333
cargo run -p wsdl-gen -- --input schemas/synthetic --out-proto generated/proto --out-rust generated/rust
3434

35-
# C headers → Rust structs + proto + mapping
36-
cargo run -p header-gen -- --input headers/ --include /usr/include --endian little --word-size 32 --define LINUX --out-rust generated/rust --out-proto generated/proto --out-mapping generated/mapping
35+
# C headers → Rust structs + proto (per-header shards under types/headers/)
36+
cargo run -p header-gen -- --input examples/headers/ --include /usr/include --endian little --word-size 32 --define LINUX --out-rust generated/rust --out-proto generated/proto
3737
```
3838

3939
## Architecture
@@ -44,7 +44,7 @@ Spear is a **Data Normalization Gateway** — it decodes legacy binary messages
4444

4545
```text
4646
.xsd files → wsdl-gen → .proto + .rs (decode/encode methods)
47-
.h files → header-gen → Rust structs + .proto + mapping functions
47+
.h files → header-gen → messages.rs + types/headers/*.proto (per-header shards)
4848
4949
Raw binary → spear-gateway → decoded output
5050
↓ (Phase 3, planned)
@@ -56,7 +56,7 @@ Spear is a **Data Normalization Gateway** — it decodes legacy binary messages
5656
| Crate | Purpose |
5757
|---|---|
5858
| `wsdl-gen` | XSD → proto3 + Rust with `decode_raw`/`encode_raw`/`encoded_size` |
59-
| `header-gen` | C headers (via libclang) → Rust structs + proto + mapping fns |
59+
| `header-gen` | C headers (via libclang) → `messages.rs` + proto3 shards with `decode_raw`/`encode_raw`/`encoded_size` |
6060
| `spear-lib` | Runtime library: WSDL parser, ProtoEnvelope, Redpanda publisher |
6161
| `spear-gateway` | Decode pipeline: raw bytes → generated types → printed output |
6262

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
# Inside the container:
1515
# wsdl-gen --input /workspace/xsds --out-proto /workspace/proto --out-rust /workspace/rust
1616
# header-gen --input /workspace/headers --endian little --word-size 32 \
17-
# --out-rust /workspace/rust --out-proto /workspace/proto --out-mapping /workspace/mapping
17+
# --out-rust /workspace/rust --out-proto /workspace/proto
1818
# cp /workspace/types.rs /spear/crates/spear-gateway/src/types.rs
1919
# # uncomment include!("types.rs") in main.rs, add decode call
2020
# cargo build --offline --release -p spear-gateway

README.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,10 @@ npm install
2525
│ └──► types.rs │
2626
│ (decode / decode_raw / encoded_size)│
2727
│ │
28-
│ .h files ──► header-gen ──► structs.rs (decode()) │
29-
│ ├──► messages.proto │
30-
│ ├──► mappers.rs │
28+
│ .h files ──► header-gen ──► messages.rs │
29+
│ │ (decode_raw/encode_raw/ │
30+
│ │ encoded_size) │
31+
│ ├──► types/headers/{stem}.proto │
3132
│ └──► review_report.txt │
3233
└─────────────────────────────────────────────────────────────┘
3334
@@ -61,7 +62,7 @@ npm install
6162
| Crate | Type | Purpose |
6263
|---|---|---|
6364
| `wsdl-gen` | binary | Code generator: XSD → `.proto` + `.rs` with `decode_raw`/`encoded_size` |
64-
| `header-gen` | binary | Code generator: C headers → Rust structs + `.proto` + mapping functions |
65+
| `header-gen` | binary | Code generator: C headers → `messages.rs` + per-header `.proto` shards with `decode_raw`/`encode_raw`/`encoded_size` |
6566
| `spear-lib` | library | Runtime: WSDL parser, `ProtoEnvelope`, Redpanda publisher |
6667
| `spear-gateway` | binary | Decode pipeline: raw binary bytes → generated types → printed output |
6768

@@ -106,11 +107,12 @@ XSD → proto3/Rust mapping rules.
106107

107108
## header-gen: C header → code generation
108109

109-
Takes a directory of `.h` files and emits three files per struct:
110+
Takes a directory of `.h` files and emits:
110111

111-
- `--out-rust` — Rust structs with `decode(bytes: &[u8])` (offset-based, configurable endianness) + `review_report.txt` for anything requiring manual review (bitfields, unions, unresolved types)
112-
- `--out-proto` — proto3 message definitions
113-
- `--out-mapping` — explicit `map_*()` functions from each Rust struct to its proto message
112+
- `--out-rust``messages.rs` with `prost::Message` + serde derives, and `decode_raw`/`decode`/`encode_raw`/`encoded_size` methods using the C struct binary layout (field positions and struct size come from libclang — matches the in-memory ABI layout produced by the C compiler, including alignment padding). Also writes `review_report.txt` for constructs requiring manual review (bitfields, unions, unresolved types).
113+
- `--out-proto` — per-header proto3 shards at `types/headers/{stem}.proto`, one per input `.h` file, with `package spear.v1.{stem};` and cross-header imports.
114+
115+
See [docs/c-proto-wire-mapping.md](docs/c-proto-wire-mapping.md) for the full type mapping table and wire format details.
114116

115117
**From the spear-dev container** (recommended — no system libclang install required):
116118

@@ -123,29 +125,27 @@ podman exec -it spear-dev header-gen \
123125
--word-size 32 \
124126
--define LINUX \
125127
--out-rust /workspace/generated/rust \
126-
--out-proto /workspace/generated/proto \
127-
--out-mapping /workspace/generated/mapping
128+
--out-proto /workspace/generated/proto
128129
```
129130

130131
**From source** (requires Rust + libclang):
131132

132133
```bash
133134
cargo run -p header-gen -- \
134-
--input headers/ \
135+
--input examples/headers/ \
135136
--include /usr/include \
136137
--endian little \
137138
--word-size 32 \
138139
--define LINUX \
139140
--out-rust generated/rust \
140-
--out-proto generated/proto \
141-
--out-mapping generated/mapping
141+
--out-proto generated/proto
142142
```
143143

144-
`--word-size` controls how `long`/`unsigned long` are mapped:
144+
`--word-size` is passed to clang as `-m32` (or left as native for 64-bit) so clang computes the correct ABI layout for the target. It also controls how `long`/`unsigned long` are mapped in the generated types:
145145
- `32``i32`/`u32` (LP32/ILP32 ABI)
146146
- `64``i64`/`u64` (LP64 ABI)
147147

148-
`--endian` controls the decode method emitted (`from_le_bytes` vs `from_be_bytes`).
148+
`--endian` is passed to clang as `-mbig-endian` when set to `big`, so clang computes the correct target layout. It also controls the byte order used by `decode_raw`/`encode_raw` (passed as `same_endianness` to the `_codec` helpers).
149149

150150
`--include PATH` (repeatable) adds an extra clang include search path. Use this for
151151
system headers or cross-compilation sysroots that your `--input` headers `#include`.
@@ -230,8 +230,7 @@ header-gen \
230230
--word-size 32 \
231231
--define LINUX \
232232
--out-rust /workspace/generated/rust \
233-
--out-proto /workspace/generated/proto \
234-
--out-mapping /workspace/generated/mapping
233+
--out-proto /workspace/generated/proto
235234
```
236235

237236
### 4. Plug in generated types and rebuild

crates/header-gen/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ clap = { workspace = true }
1313
anyhow = { workspace = true }
1414
clang = { workspace = true }
1515
clang-sys = { version = "1" }
16+
prost = { workspace = true }
17+
serde = { workspace = true }
1618

1719
[dev-dependencies]
1820
tempfile = { workspace = true }

crates/header-gen/src/emitter/mapping.rs

Lines changed: 0 additions & 124 deletions
This file was deleted.

crates/header-gen/src/emitter/mod.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
pub mod mapping;
21
pub mod proto;
32
pub mod rust_structs;
43

0 commit comments

Comments
 (0)