-
Notifications
You must be signed in to change notification settings - Fork 3.9k
fix(yaml): do not treat --- or ... inside scalar values as document separators #25686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(yaml): do not treat --- or ... inside scalar values as document separators #25686
Conversation
…eparators The YAML parser was incorrectly treating '---' and '...' as document separators even when they appeared inside scalar values (e.g., 'name: some-text---'). This happened because the check for document separators only verified line_indent == .none but didn't check if we had already scanned content for the current scalar. Added ctx.str_builder.len() == 0 check to ensure document separators are only recognized at the start of a value, not in the middle. Fixes oven-sh#25660 Signed-off-by: lif <[email protected]>
WalkthroughAdds a line-start guard ( Changes
Possibly related PRs
Suggested reviewers
Pre-merge checks✅ Passed checks (4 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (7)test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}📄 CodeRabbit inference engine (test/CLAUDE.md)
Files:
test/regression/issue/**/*.test.ts📄 CodeRabbit inference engine (test/CLAUDE.md)
Files:
**/*.test.ts?(x)📄 CodeRabbit inference engine (CLAUDE.md)
Files:
test/regression/issue/*.test.ts📄 CodeRabbit inference engine (CLAUDE.md)
Files:
test/**/*.test.ts?(x)📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.zig📄 CodeRabbit inference engine (src/CLAUDE.md)
Files:
**/*.zig📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (14)📓 Common learnings📚 Learning: 2025-11-24T18:36:59.706ZApplied to files:
📚 Learning: 2025-12-16T00:21:32.179ZApplied to files:
📚 Learning: 2025-11-24T18:37:30.259ZApplied to files:
📚 Learning: 2025-11-24T18:37:30.259ZApplied to files:
📚 Learning: 2025-10-19T02:44:46.354ZApplied to files:
📚 Learning: 2025-12-16T00:21:32.179ZApplied to files:
📚 Learning: 2025-12-16T00:21:32.179ZApplied to files:
📚 Learning: 2025-11-14T16:07:01.064ZApplied to files:
📚 Learning: 2025-11-24T18:37:30.259ZApplied to files:
📚 Learning: 2025-11-24T18:37:30.259ZApplied to files:
📚 Learning: 2025-09-07T08:20:47.215ZApplied to files:
📚 Learning: 2025-09-30T22:53:19.887ZApplied to files:
📚 Learning: 2025-11-24T18:37:30.259ZApplied to files:
🔇 Additional comments (14)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/interchange/yaml.zigtest/regression/issue/25660.test.ts
🧰 Additional context used
📓 Path-based instructions (7)
test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}
📄 CodeRabbit inference engine (test/CLAUDE.md)
test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}: Usebun:testwith files that end in*.test.{ts,js,jsx,tsx,mjs,cjs}
Do not write flaky tests. Never wait for time to pass in tests; always wait for the condition to be met instead of using an arbitrary amount of time
Never use hardcoded port numbers in tests. Always useport: 0to get a random port
Prefer concurrent tests over sequential tests usingtest.concurrentordescribe.concurrentwhen multiple tests spawn processes or write files, unless it's very difficult to make them concurrent
When spawning Bun processes in tests, usebunExeandbunEnvfromharnessto ensure the same build of Bun is used and debug logging is silenced
Use-eflag for single-file tests when spawning Bun processes
UsetempDir()from harness to create temporary directories with files for multi-file tests instead of creating files manually
Prefer async/await over callbacks in tests
When callbacks must be used and it's just a single callback, usePromise.withResolversto create a promise that can be resolved or rejected from a callback
Do not set a timeout on tests. Bun already has timeouts
UseBuffer.alloc(count, fill).toString()instead of'A'.repeat(count)to create repetitive strings in tests, as ''.repeat is very slow in debug JavaScriptCore builds
Usedescribeblocks for grouping related tests
Always useawait usingorusingto ensure proper resource cleanup in tests for APIs like Bun.listen, Bun.connect, Bun.spawn, Bun.serve, etc
Always check exit codes and test error scenarios in error tests
Usedescribe.each()for parameterized tests
UsetoMatchSnapshot()for snapshot testing
UsebeforeAll(),afterEach(),beforeEach()for setup/teardown in tests
Track resources (servers, clients) in arrays for cleanup inafterEach()
Files:
test/regression/issue/25660.test.ts
test/regression/issue/**/*.test.ts
📄 CodeRabbit inference engine (test/CLAUDE.md)
Regression tests for specific issues go in
/test/regression/issue/${issueNumber}.test.ts. Do not put tests without issue numbers in the regression directory
Files:
test/regression/issue/25660.test.ts
**/*.test.ts?(x)
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.test.ts?(x): Never usebun testdirectly - always usebun bd testto run tests with debug build changes
For single-file tests, prefer-eflag overtempDir
For multi-file tests, prefertempDirandBun.spawnover single-file tests
UsenormalizeBunSnapshotto normalize snapshot output of tests
Never write tests that check for 'panic', 'uncaught exception', or similar strings in test output
UsetempDirfromharnessto create temporary directories - do not usetmpdirSyncorfs.mkdtempSync
When spawning processes in tests, expect stdout before expecting exit code for more useful error messages on test failure
Do not write flaky tests - do not usesetTimeoutin tests; instead await the condition to be met
Verify tests fail withUSE_SYSTEM_BUN=1 bun test <file>and pass withbun bd test <file>- tests are invalid if they pass with USE_SYSTEM_BUN=1
Test files must end with.test.tsor.test.tsx
Avoid shell commands likefindorgrepin tests - use Bun's Glob and built-in tools instead
Files:
test/regression/issue/25660.test.ts
test/regression/issue/*.test.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Place regression tests for specific GitHub issues in
test/regression/issue/${issueNumber}.test.tswith real issue numbers only
Files:
test/regression/issue/25660.test.ts
test/**/*.test.ts?(x)
📄 CodeRabbit inference engine (CLAUDE.md)
Always use
port: 0in tests - do not hardcode ports or use custom random port number functions
Files:
test/regression/issue/25660.test.ts
src/**/*.zig
📄 CodeRabbit inference engine (src/CLAUDE.md)
src/**/*.zig: Private fields in Zig are fully supported using the#prefix:struct { #foo: u32 };
Use decl literals in Zig for declaration initialization:const decl: Decl = .{ .binding = 0, .value = 0 };
Prefer@importat the bottom of the file (auto formatter will move them automatically)
Files:
src/interchange/yaml.zig
**/*.zig
📄 CodeRabbit inference engine (CLAUDE.md)
In Zig code, be careful with allocators and use defer for cleanup
Files:
src/interchange/yaml.zig
🧠 Learnings (12)
📓 Common learnings
Learnt from: cirospaciari
Repo: oven-sh/bun PR: 22946
File: test/js/sql/sql.test.ts:195-202
Timestamp: 2025-09-25T22:07:13.851Z
Learning: PR oven-sh/bun#22946: JSON/JSONB result parsing updates (e.g., returning parsed arrays instead of legacy strings) are out of scope for this PR; tests keep current expectations with a TODO. Handle parsing fixes in a separate PR.
📚 Learning: 2025-11-24T18:36:59.706Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: src/bun.js/bindings/v8/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:36:59.706Z
Learning: Applies to src/bun.js/bindings/v8/test/v8/v8.test.ts : Add corresponding test cases to test/v8/v8.test.ts using checkSameOutput() function to compare Node.js and Bun output
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-12-16T00:21:32.179Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-16T00:21:32.179Z
Learning: Applies to **/*.test.ts?(x) : Verify tests fail with `USE_SYSTEM_BUN=1 bun test <file>` and pass with `bun bd test <file>` - tests are invalid if they pass with USE_SYSTEM_BUN=1
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/regression/issue/**/*.test.ts : Regression tests for specific issues go in `/test/regression/issue/${issueNumber}.test.ts`. Do not put tests without issue numbers in the regression directory
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-12-16T00:21:32.179Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-16T00:21:32.179Z
Learning: Applies to test/regression/issue/*.test.ts : Place regression tests for specific GitHub issues in `test/regression/issue/${issueNumber}.test.ts` with real issue numbers only
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/**/*-fixture.ts : Test files that spawn Bun processes should end in `*-fixture.ts` to identify them as test fixtures and not tests themselves
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-12-16T00:21:32.179Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-16T00:21:32.179Z
Learning: Applies to **/*.test.ts?(x) : Use `normalizeBunSnapshot` to normalize snapshot output of tests
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-10-19T02:44:46.354Z
Learnt from: theshadow27
Repo: oven-sh/bun PR: 23798
File: packages/bun-otel/context-propagation.test.ts:1-1
Timestamp: 2025-10-19T02:44:46.354Z
Learning: In the Bun repository, standalone packages under packages/ (e.g., bun-vscode, bun-inspector-protocol, bun-plugin-yaml, bun-plugin-svelte, bun-debug-adapter-protocol, bun-otel) co-locate their tests with package source code using *.test.ts files. This follows standard npm/monorepo patterns. The test/ directory hierarchy (test/js/bun/, test/cli/, test/js/node/) is reserved for testing Bun's core runtime APIs and built-in functionality, not standalone packages.
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-11-14T16:07:01.064Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 24719
File: docs/bundler/executables.mdx:527-560
Timestamp: 2025-11-14T16:07:01.064Z
Learning: In the Bun repository, certain bundler features like compile with code splitting (--compile --splitting) are CLI-only and not supported in the Bun.build() JavaScript API. Tests for CLI-only features use backend: "cli" flag (e.g., test/bundler/bundler_compile_splitting.test.ts). The CompileBuildConfig interface correctly restricts these with splitting?: never;. When documenting CLI-only bundler features, add a note clarifying they're not available via the programmatic API.
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} : Use `bun:test` with files that end in `*.test.{ts,js,jsx,tsx,mjs,cjs}`
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-11-24T18:37:30.259Z
Learnt from: CR
Repo: oven-sh/bun PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-11-24T18:37:30.259Z
Learning: Applies to test/**/*.test.{ts,js,jsx,tsx,mjs,cjs} : Use `-e` flag for single-file tests when spawning Bun processes
Applied to files:
test/regression/issue/25660.test.ts
📚 Learning: 2025-09-07T08:20:47.215Z
Learnt from: RiskyMH
Repo: oven-sh/bun PR: 22258
File: src/cli/test_command.zig:1258-1281
Timestamp: 2025-09-07T08:20:47.215Z
Learning: For Bun's test line filtering feature, the parseFileLineArg function should only handle the specific cases of "file:line" and "file:line:col" formats. It should not try to be overly tolerant of other patterns, as components like ":col" or other non-numeric segments could legitimately be part of filenames. The current conservative approach that checks for numeric segments in expected positions is appropriate.
Applied to files:
test/regression/issue/25660.test.ts
🔇 Additional comments (3)
src/interchange/yaml.zig (2)
2291-2294: LGTM! The fix correctly prevents false document separator detection.The added guard
ctx.str_builder.len() == 0ensures that---is only recognized as a document separator when:
- No scalar content has been accumulated yet
- We're at the start of a line (
self.line_indent == .none)- The sequence is followed by whitespace or EOF
This correctly handles cases like
name: some-text---where---is part of the scalar value.
2310-2313: Consistent fix applied for document end marker.The same guard condition is correctly applied to the
...document end marker, ensuring symmetrical behavior with the---fix.test/regression/issue/25660.test.ts (1)
1-65: Well-structured regression test with good coverage.The test file correctly:
- Uses
bun:testas required- Follows the naming convention
test/regression/issue/${issueNumber}.test.ts- Covers the main fix scenario plus several edge cases
- Tests both
---and...markers in various positions- Verifies that legitimate document separators still work correctly (line 34-46)
|
@dylan-conway can you review? you have more context on YAML than me |
dylan-conway
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will break top level plain scalars (unquoted values). For example ... will be included in the value of
hello
...when it should be recognized as a document end marker.
I suggest adding the var nl = false; pattern that's used in scanSingleQuotedScalar and scanDoubleQuotedScalar. It's a little messy but works well
Replace the str_builder.len() == 0 check with a proper nl (newline) flag to correctly detect document separators (--- and ...). The previous approach would fail for top-level plain scalars like: ```yaml hello ... ``` Where `...` should be recognized as a document end marker but wasn't because str_builder already contained "hello". The nl flag pattern (already used in scanSingleQuotedScalar and scanDoubleQuotedScalar) correctly tracks whether we're at the start of a new line, allowing proper detection of: - `name: text---more` → `---` not at line start, not a separator - `hello\n...` → `...` at line start, is a separator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add test cases that verify the fix handles Dylan's feedback correctly: - Top-level plain scalar followed by `...` document end marker - Top-level plain scalar followed by `---` document separator These tests ensure the nl flag approach correctly recognizes document markers at line start even when str_builder already has content. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
f5bc5ab to
9dc9895
Compare
|
Thanks for the feedback! I've updated the implementation to use the The fix now tracks whether we're at the start of a new line, matching the approach in Also added test cases for the top-level plain scalar scenario you mentioned:
|
| '\n', '\r' => true, | ||
| else => false, | ||
| }; | ||
| if (line_start and self.line_indent == .none and self.remainStartsWith("---") and self.isAnyOrEofAt(" \t\n\r", 3)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I think I see the bug! isAnyOrEofAt returns false instead of true if the position out of bounds
fn isAnyOrEofAt(self: *const @This(), values: []const enc.unit(), n: usize) bool {
const pos = self.pos.add(n);
if (pos.isLessThan(self.input.len)) {
return std.mem.indexOfScalar(enc.unit(), values, self.input[pos.cast()]) != null;
}
+ return true;
- return false;
}
Summary
Test plan
Fixes #25660