Skip to content

Parser precedence: OpenAPI and scraper-markdown files may be handled by wrong parser #312

@diegovogel

Description

@diegovogel

Problem

GetParserForFile() in internal/languages/interface.go:163 iterates a map[string]LanguageParser. Since Go map iteration is nondeterministic, when multiple parsers match the same file, which one wins is unpredictable. This causes two conflicts:

  1. OpenAPI vs JSON: An openapi.json file matches both OpenAPIParser.CanParse() (content-based check) and JSONParser.CanParse() (extension-only). If JSON wins the map race, --include-openapi is bypassed and OpenAPI-specific extraction is lost.

  2. Scraper-markdown vs Markdown: A .scraped.md file matches both ScraperMarkdownParser.CanParse() and MarkdownParser.CanParse() (both extension-only). If Markdown wins, --include-scraper-markdown is bypassed.

Existing precedent

The YAML parser already solves this correctly: its CanParse() reads file content and returns false for OpenAPI/Swagger specs (yaml_parser.go:31-42). The JSON and Markdown parsers don't have equivalent guards.

Proposed fix

Apply the same "yield" pattern to the generic parsers:

  • json_parser.go: In CanParse(), for .json files, read content and return false if it contains "openapi": or "swagger": markers (same logic as yaml_parser.go).
  • markdown_parser.go: In CanParse(), return false for .scraped.md files (pure extension check, no content reading needed).

No changes needed to GetParserForFile() or the registry data structure. Both binaries (internal/cli/ingest.go and cmd/ingest-codebase/main.go) benefit automatically since the fix is in the parser layer.

Found by

Codex review during PR work on #294 follow-up (adding missing language flags to internal/cli/ingest.go).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions