Problem
GetParserForFile() in internal/languages/interface.go:163 iterates a map[string]LanguageParser. Since Go map iteration is nondeterministic, when multiple parsers match the same file, which one wins is unpredictable. This causes two conflicts:
-
OpenAPI vs JSON: An openapi.json file matches both OpenAPIParser.CanParse() (content-based check) and JSONParser.CanParse() (extension-only). If JSON wins the map race, --include-openapi is bypassed and OpenAPI-specific extraction is lost.
-
Scraper-markdown vs Markdown: A .scraped.md file matches both ScraperMarkdownParser.CanParse() and MarkdownParser.CanParse() (both extension-only). If Markdown wins, --include-scraper-markdown is bypassed.
Existing precedent
The YAML parser already solves this correctly: its CanParse() reads file content and returns false for OpenAPI/Swagger specs (yaml_parser.go:31-42). The JSON and Markdown parsers don't have equivalent guards.
Proposed fix
Apply the same "yield" pattern to the generic parsers:
json_parser.go: In CanParse(), for .json files, read content and return false if it contains "openapi": or "swagger": markers (same logic as yaml_parser.go).
markdown_parser.go: In CanParse(), return false for .scraped.md files (pure extension check, no content reading needed).
No changes needed to GetParserForFile() or the registry data structure. Both binaries (internal/cli/ingest.go and cmd/ingest-codebase/main.go) benefit automatically since the fix is in the parser layer.
Found by
Codex review during PR work on #294 follow-up (adding missing language flags to internal/cli/ingest.go).
Problem
GetParserForFile()ininternal/languages/interface.go:163iterates amap[string]LanguageParser. Since Go map iteration is nondeterministic, when multiple parsers match the same file, which one wins is unpredictable. This causes two conflicts:OpenAPI vs JSON: An
openapi.jsonfile matches bothOpenAPIParser.CanParse()(content-based check) andJSONParser.CanParse()(extension-only). If JSON wins the map race,--include-openapiis bypassed and OpenAPI-specific extraction is lost.Scraper-markdown vs Markdown: A
.scraped.mdfile matches bothScraperMarkdownParser.CanParse()andMarkdownParser.CanParse()(both extension-only). If Markdown wins,--include-scraper-markdownis bypassed.Existing precedent
The YAML parser already solves this correctly: its
CanParse()reads file content and returnsfalsefor OpenAPI/Swagger specs (yaml_parser.go:31-42). The JSON and Markdown parsers don't have equivalent guards.Proposed fix
Apply the same "yield" pattern to the generic parsers:
json_parser.go: InCanParse(), for.jsonfiles, read content and returnfalseif it contains"openapi":or"swagger":markers (same logic as yaml_parser.go).markdown_parser.go: InCanParse(), returnfalsefor.scraped.mdfiles (pure extension check, no content reading needed).No changes needed to
GetParserForFile()or the registry data structure. Both binaries (internal/cli/ingest.goandcmd/ingest-codebase/main.go) benefit automatically since the fix is in the parser layer.Found by
Codex review during PR work on #294 follow-up (adding missing language flags to
internal/cli/ingest.go).