All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Downloader now honours the host OS trust store by default (#125). Manifest and bundle downloads from
github.com/kreuzberg-dev/tree-sitter-language-pack/releases/...previously used ureq 3.x's default rustls agent, which trusts only the bundled Mozilla webpki roots and ignores the platform store. On Linux/WSL2 hosts where GitHub HTTPS traffic is presented with a chain rooted in a locally trusted (corp / private) CA — and wherecurl,pip, andgitall succeed against the same URL via the OS trust store — first-use parser downloads failed withDownloadError: ... io: invalid peer certificate: UnknownIssuer. The downloader now constructs a configuredureq::AgentwithRootCerts::PlatformVerifierby default (viarustls-platform-verifier), matching the behaviour of every other host-trust-aware HTTP client on the system. SetTREE_SITTER_LANGUAGE_PACK_TLS_ROOTS=webpkito opt back into ureq's bundled Mozilla roots; setTREE_SITTER_LANGUAGE_PACK_TLS_ROOTS=platformto make the default explicit. Affects every binding (Python, Node.js, Ruby, PHP, Go, Java, C#, Elixir, WASM, Dart, Swift, Zig, Kotlin-Android) because the fix lives entirely in the sharedts-pack-coreRust crate. (crates/ts-pack-core/src/{download.rs,pack_config.rs}, workspaceCargo.toml)
wolframgrammar dropped from the language pack.tree-sitter-wolframproduces glibc heap corruption (free(): invalid next size) when parsing trivial input under serial test execution on Linux; macOS allocator silently tolerated the corruption. The entire upstream ecosystem is unmaintained (canonicalbostick/tree-sitter-wolframlast touched 2021-11-11 with 3 stars; every known fork —LumaKernel,LoganAMorrison,JuanG970,jakassebaum— ships the sameLANGUAGE_VERSION 13parser tables and is inactive). Rather than fork-and-maintain a Wolfram grammar in-house for marginal demand, the entry is removed fromlanguage_definitions.json, all CITSLP_LANGUAGESlists, the smoke fixture, the e2e harness, the docs, and the README ecosystem listings. Total supported grammar count drops from 306 to 305, which matches the long-standing "305 languages" marketing copy (previously off-by-one due to the broken wolfram entry).
- Regenerated all alef-managed surfaces (per-binding READMEs, API reference docs, generated bindings, e2e tests) and the script-managed docs/languages.md +
_supported_languages.pyto reflect the 305-grammar count. scripts/generate_grammar_table.pydefault output path corrected fromdocs/supported-languages.mdto the canonical nav-referenceddocs/languages.md; Taskfiledocs:generate:languagesgenerates:field updated to match.
- Four new language bindings via alef 0.16.6, taking total binding count from 10 to 14:
- Dart / Flutter —
dart pub add tree_sitter_language_pack. Built with flutter_rust_bridge for isolate-safe Future APIs. - Kotlin (Android) —
dev.kreuzberg.tslp:tslp-androidAAR on Maven Central. JNI-based with per-ABI native libraries (arm64-v8a, armeabi-v7a, x86_64, x86). JVM Kotlin users continue to consume the canonical Java / Panama-FFM package. - Swift —
TreeSitterLanguagePackvia SwiftPM. swift-bridge for macOS, iOS, and Linux. - Zig —
zig fetch --save <tarball-url>from GitHub Releases. Direct C FFI via@cImport.
- Dart / Flutter —
- Two new Rust binding crates:
tree-sitter-language-pack-dart(FRB bridge) andtree-sitter-language-pack-swift(swift-bridge). - Hand-written
crates/ts-pack-core-jniRust crate exportingJava_...JNI symbols for the Kotlin-Android binding (excluded from the default workspace build because it cross-compiles viacargo ndk). - Per-language CI workflows:
ci-zig.yaml,ci-swift.yaml,ci-dart.yaml, plus a combinedci-mobile.yamlcovering Android cross-compile + iOS cargo check. - Publish jobs for pub.dev (
publish-pub), Swift Package Index (publish-swift), Zig (publish-zig→ GitHub Release tarball), and Maven Central kotlin-android (publish-kotlin-android).
- E2E fixture coverage for: language alias resolution (
shell→bash) viahas_language/get_language/get_parser(3 fixtures);downloadedge cases — empty list, multiple-language, and unknown-language error path (3 fixtures); error-handling for 120KB sources andget_language("")(2 fixtures); and TypeScript function parsing (1 fixture). Brings fixture count from 403 to 412, covering 100% of the publicdownload,get_*, andhas_languagesurface across all 10 language bindings.
-
Node:
getLanguage(name)now returns a realtree-sitterLanguagethatnew Parser().setLanguage(lang)accepts at runtime. The previous capsule shim usednapi::bindgen_prelude::External::new(rejected bynode-tree-sitter'sUnwrapLanguage), wrote the External to__parser, and did not type-tag the value. Adopts alef v0.15.49 where the napi capsule codegen emits rawnapi_create_external+napi_type_tag_objectand readsproperty_name/type_tagfrom[crates.node.capsule_types]. -
Python:
PackConfigandProcessConfigtype hints now resolve to the.optionsdataclasses, fixingmypy --stricterrors at everyinit(...)/process(...)call site (adopts alef #72). -
Python: restore
SupportedLanguageasLiteral[...]of all 306 grammars attree_sitter_language_pack.SupportedLanguage. The symbol was dropped during the alef 0.15.x codegen migration and re-importing it raisedImportErrorin 1.8.0 (#121). -
Python:
get_parser("python").parse(b"...")returns a realtree_sitter.Treeagain instead of raisingAttributeError.get_parser/get_languagenow return nativetree_sitter.Parser/tree_sitter.Languageinstances via PyO3 capsule pass-through (alef v0.15.39 wirescapsule_typesthroughgen_bindings) (#121).
- CI pinned to Node 22 LTS across all workflows.
tree-sitter@0.25.0(thetree-sitternpm package) ships abinding.ccwritten against pre-C++20 stdlib (nostd::ranges,concept,requires) and fails to compile against Node 24/26's V8 headers. Node 22 is the latest supported runtime until upstreamnode-tree-sitterupdates itscflags_ccor ships prebuilds. - CPD pre-commit hook and
packages/java/pom.xmlmaven-pmd-pluginminimum-tokens bumped from 100 → 250: alef's java codegen emits ~200-tokentry/catchcleanup blocks onDownloadManager/LanguageRegistry. Refactoring the codegen to share a helper is tracked separately.
- macOS x86_64 native binaries across all polyglot bindings (Python wheels, npm napi, Ruby gem, Maven JAR, NuGet, C FFI, Go FFI, libts-pack bottle) — restores Intel Mac coverage that was missing under the alef 0.11 transition
- Real Homebrew bottle protocol for both
ts-pack(CLI) andlibts-pack(FFI library) viabrew install --build-bottle+brew bottle --json, replacing the prior synthetic tarball approach. Eight bottles per release acrossarm64_sequoia,sequoia,arm64_linux,x86_64_linux.brew installnow pours instead of source-building libts-packHomebrew formula bundling tree-sitter language pack as a C library (headers + dylib/so + static archive)- Python sdist published to PyPI alongside the existing platform wheels
- E2E fixtures covering Kotlin package + class structure (
kotlin_package_class_intel.json), Java package declarations (java_package_intel.json), and a process call exercising the typedextractionsmap (process_with_extractions.json)
- Migrated to alef 0.15.x (Jinja-based codegen) for all polyglot bindings — Python, TypeScript, Ruby, Go, Java, C#, Elixir, PHP, WASM
- WASM now ships the
--target nodejsbuild to npm so consumers no longer hit the bundler-onlyimport * from "env"failure onrequire() - WASM coverage scoped to a curated 32-language subset to fit the 16 GB GitHub runner during builds
- Intel: emit
StructureKind::Modulefor Kotlinpackage_headerand Javapackage_declarationso callers can build fully-qualified names for JVM languages (#112) - Intel: resolve structure names via a fallback chain (
namefield →type_identifier→identifier→scoped_identifier) so Kotlin classes and Java/Kotlin packages no longer surface with null names (#111) - Java: ship
natives/{rid}/entries inside the published JAR —actions/download-artifactproduces nested artifact paths, and the previous staging loop preserved them, so every platform hitUnsatisfiedLinkErroron load. Flatten viafindand add presence/jar tfguard steps so the regression cannot ship silently again (#114) - Bindings: surface
extractionsas a typedMap<String, ExtractionPattern>/Map<String, PatternResult>across Java, Python, Go, TypeScript, Ruby, PHP, C#, Elixir, FFI, and WASM (wasOptional<String>on Java, blocking pattern extractions through the high-level API). Driven by the alef 0.12.4 codegen fix forAHashMap-typed fields (#115) - C#: strip duplicate
{lines emitted by alef 0.14.33 codegen so generated.csfiles compile - Ruby: regenerated
native.rbno longer recurses into itself viadefine_singleton_method— magnus codegen now skips re-export when binding name matches the native module method - Node:
index.jsnow contains real platform-dispatch logic sorequire()resolves the correct.darwin-arm64.node/.linux-x64-gnu.node/etc. instead of failing on the un-suffixed bundle name - WASM: drop bundler-only output, removing spurious
'env'module imports that brokerequire()from Node consumers - Maven JAR previously missed
linux-x86_64natives because of stage-loop path mishandling; flatten artifact downloads and add ajar tfguard - Hex.pm
metadata.configsize limit — exclude the parser sources tarball from the package - PHP: fix broken
crates/ts-pack-php/README.mdlinks in rootREADME.md— path moved topackages/php/README.mdafter alef migration (#106) - PHP: fix
.task/php.ymlbuild,build:dev, andcleantasks pointing to removedcrates/ts-pack-php/— corrected tocrates/ts-pack-core-php/(#106) - PHP: align
packages/php/composer.jsonandpackages/php/README.mdpackage name to canonical Packagist vendor slug (kreuzberg/notkreuzberg-dev/) (#106) - PHP: document
mlocati/php-extension-installerprerequisite in install docs and correct minimum PHP version to 8.4+ (#106) - Go: regenerate stale
binding.gowith current alef generator
- Migrate to alef polyglot binding generator — all language bindings (Python, TypeScript, Ruby, Go, Java, C#, Elixir, PHP, WASM) are now generated from a single
alef.tomlconfiguration Default,Hash,PartialEq,Eqderives on all public types- 18 new e2e test fixtures closing testing gaps across all binding languages
- Consolidated CI: 12 language-specific workflows merged into a single
ci.yaml - Registry-mode e2e test apps under
test_apps/(generated viaalef e2e generate --registry)
- Public API locked down with
pub(crate)— only functions and types that were in the pre-alef Python bindings are exported; internal modules (json_utils,intelsubmodules,config,definitions) are no longer public - Workspace lints applied to all binding crates (
clippy::all = "deny",unsafe_code = "deny") test_apps/moved fromtests/test_apps/to project root
available_languages(),has_language(), andlanguage_count()now register the download cache directory before querying the registry — fixes empty results when using thedownloadfeature (#90)process()auto-downloads missing parsers instead of returningLanguageNotFound(#94)- C# task references updated from
.slnto.csproj - Maven version plugin pinned to exclude alpha/beta/RC versions
- Docker CI:
uv runchanged touv run --no-projectto avoid triggering root pyproject.toml build - Ruby CI: removed stale
working-directorythat pointed to wrong path
- Go: fix FFI build defaults — add
TSLP_LINK_MODEandTSLP_LANGUAGESenv vars to Go task (#102) - Go: fix CGO
LDFLAGSpaths — point to workspacetarget/release/instead of crate-local path (#102) - Go: remove duplicate forward declarations from
ffi.go(already ints_pack.h) (#102) - Go: fix README examples — proper error handling, correct API signatures (
Init,Download) (#102) - FFI: add extra libs dir from
cache_dir()to registry on creation (#102) - Docs: fix textlint pre-commit hook — add
additional_dependenciesfor all textlint plugins (#102)
- Compile bundled grammars with
-fno-strict-aliasingto prevent undefined behavior (#100)
- Update dependencies across lockfiles
- Regenerate READMEs for 1.6.1 version bump (#101)
- Go: move package root from
packages/go/v1/topackages/go/so the Go module proxy can resolvego.modat the correct path —go get github.com/kreuzberg-dev/tree-sitter-language-pack/packages/gonow works (#97) - Go: fix CGO
SRCDIR-relative include/lib paths (one fewer../after directory restructure) - Remove
features = ["all"]from e2e Rust testCargo.toml— usedownloadfeature for runtime parser fetching - Remove 305
lang-*features to unblock crates.io publish (300 feature limit) - Regenerate READMEs for v1.6.0, fix Windows query cache test flake
- Bump
rustls-webpkito patch RUSTSEC-2026-0098 and RUSTSEC-2026-0099 (#99) - Fix MIME type inference in core build by embedding
language_definitions.jsonin crate
- Update dependencies across Python, Node.js, PHP, and Rust lockfiles
- Replace feature group docs with
download/TSLP_LANGUAGESdocumentation in READMEs
- Thread-local parser cache in
parse_string()— avoids re-creating parsers on repeated calls for the same language - Two-level compiled query cache (thread-local + global) in
run_query()— avoids recompiling tree-sitter queries parse_with_language()internal API for callers that already have aLanguageobject- Pre-computed capture names in
CompiledExtraction— avoids rebuilding on every extraction call - Go
type_specdeclarations extracted as symbols with correctSymbolKind(struct, interface, type) - Dedicated "Download Parsers" section in quickstart docs covering CLI, programmatic APIs, groups, Docker/CI, and config files
- Tests for parser cache reuse, query cache sharing across threads, cursor byte-range isolation, and capture name correctness
compiled_query()now propagatesError::LockPoisonedinstead of silently ignoring poisoned RwLockQueryCursorbyte-range no longer leaks between patterns when reusing the cursor inextract_from_tree()- Replaced
std::collections::HashMapwithahash::AHashMapin parser cache for consistency - Redundant
get_language()call removed fromparse_string()hot path — only called on cache miss
CompiledExtraction::extract()andintel::parse_source()now use the thread-local parser cacheQueryCursorreused across patterns within a singleextract_from_tree()call- Unnecessary
Stringallocation removed fromnode_types.contains()check in chunking
- All 305
lang-*Cargo features and group features (all,web,systems,scripting,data,jvm,functional,wasm) — language selection is now viaTSLP_LANGUAGESenv var at build time; thedownloadfeature (default) fetches parsers at runtime
- 57 new permissively-licensed grammars — 305 languages total
- abl, c3, cel, cfml, chuck, cst, dhall, elvish, gap, gdshader, glimmer, gnuplot, gotmpl, gowork, gpg, hjson, hocon, hoon, htmldjango, jai, javadoc, json5, kcl, mlir, nasm, norg_meta, ocamllex, openscad, phpdoc, poe_filter, prql, rasi, razor, rbs, roc, rtf, slang, smalltalk, sml, snakemake, souffle, sourcepawn, sql_bigquery, stan, superhtml, sway, systemverilog, tact, tera, typespec, typoscript, vhs, vrl, wgsl_bevy, x86asm, ziggy, ziggy_schema
- CI license validation job in
ci-validate.yaml— blocks PRs that introduce non-permissive (GPL/AGPL/LGPL/MPL) grammars
lessgrammar: regenerated parser from ABI 11 to ABI 14 (was incompatible with tree-sitter 0.26)cornsmoke fixture: replaced invalid"x"snippet with valid corn syntax
- Include
language_definitions.jsonin the published crate sobuild.rscan find extension mappings, ambiguity data, and C symbol overrides when installed from crates.io
- Updated dependencies across all language ecosystems
- Expose
detect_languagein Python public API (#85) - PHP extension name corrected to
ts-pack-php(hyphens)
- All language snippet READMEs and documentation corrected
- Removed automated grammar updates workflow
C_SYMBOL_OVERRIDEStable now includes ALL languages fromlanguage_definitions.json, not just compiled ones — fixes download and loading ofcsharp,vb,embeddedtemplate,nushellfrom PyPI/npm/RubyGems packagesdownloaded_languages()returns canonical names (csharp) instead of c_symbol names (c_sharp)- Elixir NIF publish: upload both hyphen and underscore artifact names so RustlerPrecompiled can find them
- Elixir NIF 2.17 packaging: fix stale variable names from dual-name refactor
- Ruby comprehensive test: remove
JSON.parseon native Hash return fromprocess() - Go comprehensive test: access flat
ProcessResultfields directly (nometadatawrapper) - Homebrew bottle and PHP PIE packages now included in release artifacts
- Dependency updates across all language ecosystems
rustler_precompiledupdated to 0.9.0 (Elixir)
- Dynamic parser loading for languages with
c_symboloverrides (csharp,vb,embeddedtemplate,nushell) — build was naming libraries with the raw name but runtime loader expected thec_symbolname (#80) - Go E2E generator: unused
tspackimport in non-process test files - Elixir: add missing
extract/2andvalidate_extraction/1NIF declarations - PHP E2E generator: use double-quoted strings for source code so
\nis interpreted correctly - Nim grammar: switch from abandoned
paranim/tree-sitter-nim(ABI v11) toaMOPel/tree-sitter-nim(MIT, ABI v14)
- Smoke test fixtures for all
c_symboloverride languages (csharp, vb, embeddedtemplate, nushell) - Dynamic-linking CI step in
ci-all-grammars.yamlto catchc_symbolnaming mismatches
- Ruby binding:
process(),extract(),validate_extraction()now return native Ruby Hash instead of raw JSON string - WASM binding: output keys now use camelCase (matching Node.js binding convention), input config accepts both camelCase and snake_case
- Go E2E generator: use typed
*ProcessResultstruct fields instead of invalidjson.Unmarshalon non-string return - Elixir CI: stage NIF with both hyphenated and underscored filenames to satisfy Rustler force-build check and
load_fromloader
- Extraction query API: run user-defined tree-sitter queries and get structured results
extract_patterns()/extract()across Python, Node.js, Rust, Ruby, Elixir, PHP, WASM, C FFIvalidate_extraction()for config validation without executionCompiledExtractionfor pre-compiled query reuse (Rust)ProcessConfig.extractionsfor combining custom queries with standard analysis- Types: ExtractionConfig, ExtractionPattern, CaptureOutput, CaptureResult, MatchResult, PatternResult, ExtractionResult
- Criterion benchmarks: 9 groups, 23 benchmarks across Python, TypeScript, Rust, Go
- Extraction queries guide and documentation across all API references
- E2E generator:
process_imports_contains_sourceassertion uses contains instead of equality - WASM: language list matches actual compiled features (30 languages)
- WASM: add missing
detectLanguageFromPathanddetectLanguageFromExtensionexports - PHP generator: null array handling in
process()result assertions - Elixir: RustlerPrecompiled
cratefield resolution withload_fromoverride - Predicate evaluation: remove redundant re-evaluation (tree-sitter 0.26 handles internally)
- Documentation: stale version numbers, incomplete API references, incorrect function signatures
- Java version requirement standardized to JDK 25+
- Nushell grammar
c_symboloverride — linker errorundefined symbol: tree_sitter_nushell - E2E generator calling
.as_deref()onStringtype (compile error on CI) - WASM build: gate
c_symbol_forbehinddynamic-loading/downloadfeatures (dead code warning) - Elixir publish: align RustlerPrecompiled
crate:field with Cargo[lib]name (underscores, not hyphens) - Elixir publish: add
--cfgflag patch to publish workflow for Rustler 0.37.3 compatibility - Python
without_gil(): addcatch_unwindto ensure GIL is reacquired on panic - Text splitter: prevent zero-width chunks in pathological UTF-8 edge case
- Comment kind detection: handle
//!,/*!, anddoc_commentnode types - Import detection: restrict fallback to explicitly supported languages only
- Export detection: use field-based AST matching instead of fragile
text.contains()
- Registry:
Arc<Vec<PathBuf>>for extra lib dirs (avoids Vec clone per language lookup) - Registry:
AHashSet<&str>inavailable_languages()(avoids 248+ String allocations) NodeInfo.kindusesCow::Borrowed(zero-copy from tree-sitter's&'static str)- Python:
with_tree()/try_with_tree()helpers replace 9 duplicate lock patterns - Python:
without_gil()helper replaces 5 duplicate GIL release patterns - Core:
extension_ambiguity_json()helper replaces duplicated JSON serialization in 4 bindings - Chunking:
MetadataCollectorstruct reduces function from 11 to 7 parameters - FFI: 25 SAFETY comments added to unsafe blocks
- Docs: rewrite all 12 API references to match actual binding source code
- Docs: add JSON-LD structured data and Open Graph metadata for crawlers
- 49 new permissively-licensed grammars — 248 languages total
- angular, bass, blade, brightscript, circom, cooklang, corn, crystal, cue, cylc, desktop, djot, earthfile, ebnf, editorconfig, eds, eex, elsa, enforce, facility, faust, fidl, foam, forth, git_config, git_rebase, godot_resource, http, hurl, just, ledger, less, liquid, mojo, move, nickel, nginx, norg, nushell, promql, pug, ql, robot, teal, templ, tmux, todotxt, turtle, vimdoc, wolfram
- Grammar updater automation (
scripts/check_grammar_updates.py) with weekly CI workflow - Generated supported languages table (
docs/supported-languages.md) integrated into docs CI - Node.js NAPI exports:
detectLanguageFromExtension,detectLanguageFromPath,getHighlightsQuery,extensionAmbiguity - E2E
processtest category withprocess()API coverage across all 11 language bindings
- Download/load filename mismatch for languages with c_symbol overrides (csharp, embeddedtemplate, vb) — fixes #80
- E2E fixture system: merged stale
intel/andmetadata/directories into unifiedprocess/category - TypeScript and WASM e2e generators now use camelCase for metrics keys
- Docker CI grammar fixture updated to include all languages
- Elixir publish workflow: checksum file verification, increased retry timeout
- Missing Node.js
index.jsexports for detection and query functions
- Renamed e2e fixture assertions from
intel_*/meta_*toprocess_* - All documentation and package descriptions updated to reflect 248 languages
- New language:
al(AL / Business Central) — 198 languages total - Grammar license linter (
scripts/lint_grammar_licenses.py,task lint:licenses) verifies all grammars use permissive licenses - Permissive license policy documented in CONTRIBUTING.md, docs, and README
- Replace
nimgrammar (alaviss, MPL-2.0 copyleft) with paranim/tree-sitter-nim (MIT) - Replace
prologgrammar (codeberg foxy, AGPL-3.0 copyleft) with Rukiza/tree-sitter-prolog (ISC) - Docs: align mkdocs config with kreuzberg branding; mermaid diagrams now render (fixes #81)
- Dynamic loader: resolve
c_symboloverrides for csharp, embeddedtemplate, and vb soget_language()works for dynamically loaded grammars (fixes #80) - E2E generator: enable all ProcessConfig features (structure, imports, exports, comments, docstrings, symbols, diagnostics) for intel tests so diagnostics assertions pass
- 23 new smoke test fixtures for languages missing coverage: asciidoc, awk, batch, caddy, cedar, cedarschema, csharp, devicetree, diff, dot, embeddedtemplate, idris, jinja2, jq, lean, pkl, postscript, prolog, rescript, ssh_config, textproto, tlaplus, vb, wit, zsh
- CI workflow (
ci-all-grammars.yaml) that tests all 197 grammars end-to-end, preventing regressions like #80 rust:e2e:all-grammarstask for running the full grammar suite locally
- Elixir NIF: fix Rustler crate name mismatch (
ts_pack_elixir→ts-pack-elixir) causing compilation failure - Rust crate publish: embed query file contents at build time instead of using
include_str!with relative paths that break in the cargo package tarball
- WASM build: ahash uses compile-time-rng instead of runtime-rng (avoids getrandom on wasm32)
- Docker/static build: add
c_symboloverride for grammars with non-standard C symbol names (csharp, vb, embeddedtemplate) - Unused imports when
dynamic-loadingfeature disabled (WASM builds) - Python sdist:
.pyiandpy.typednow included in both wheel and sdist - C# build: add missing
ExtensionAmbiguityResultmodel class - Set
generate: truefor csharp, vb, embeddedtemplate grammars
- Switch from
std::HashMap/HashSettoahash::AHashMap/AHashSetfor faster hashing in registry
- 20 new languages from arborium: asciidoc, awk, caddy, cedar, cedarschema, devicetree, dot, idris, jinja2, jq, lean, postscript, prolog, rescript, ssh_config, textproto, tlaplus, vb, wasm-interface-types, zsh (197 total)
- Centralized extension-to-language mapping:
sources/language_definitions.jsonis the single source of truth for 239 file extensions across 197 languages - Build-time code generation:
build.rsgenerates extension lookup with strict validation (panics on duplicates, non-ASCII, uppercase, dots) detect_language_from_content(content): shebang-based language detection (#!/usr/bin/env python3→ "python")extension_ambiguity(ext): query whether a file extension is ambiguous (e.g..m→ objc with matlab alternative)- Highlight query bundling:
get_highlights_query(lang),get_injections_query(lang),get_locals_query(lang)— embed .scm queries at build time ambiguousfield inlanguage_definitions.jsonfor declaring known extension ambiguities- E2E test fixtures and generators for detect-language, ambiguity, and highlights across all 11 language targets
- New APIs exposed in all bindings: Python, Node.js, Ruby, WASM, Elixir, PHP, C FFI, Go, C#
LanguageRegistryusesArc<RwLock<Vec<PathBuf>>>for interior mutability — no more globalRwLockwrapper, eliminates lock poisoning riskProcessConfig.language:String→Cow<'static, str>(zero allocation for string literals)NodeInfo.kind,QueryMatch.captures:String→Cow<'static, str>available_languages()usesHashSetfor O(1) dedup instead of O(n) Vec contains- Chunking line counting uses precomputed newline table with binary search (O(log n) per chunk vs O(n))
- Added
memchrdependency for fast byte scanning in text splitter and chunking - Extension/ambiguity lookups generated from JSON at build time
clone_vendors.pynow copiesqueries/directories alongsidesrc/
- Strong types in all binding stubs: Python
.pyi(TypedDicts), TypeScript.d.ts(interfaces), Ruby.rbs(record types), C#Models.cs(string enums replaceobject) - Pre-existing registry test failures from global
RwLockpoisoning — test helpers now use localLanguageRegistry::new() - Removed ambiguous
.os(bsl) and.cls(apex/LaTeX conflict) extensions
- Docker: separated publish-docker workflow from main publish (180-minute timeout for multiplatform builds)
- Docker: publish-docker now triggers on
releaseevents and includes full smoke tests before push - Test apps: all bindings now download languages before running tests (Ruby, Go, Elixir)
- Test apps: Rust test app adds parse_string validation tests
- Test apps: CLI smoke test adds chunking test
- Test apps: added Homebrew smoke test suite
- npm publish authentication and registry configuration
- Elixir NIF binary build and checksum generation
- Ruby CI and WASM build timeout
- Version sync across binding manifests
- tree-sitter-cobol grammar support
- MSVC build compatibility for cobol grammar
- Alpine Linux (musl) wheel platform tag support (PEP 656)
- Wheel file discovery in CI test action
- tree-sitter-bsl (1C:Enterprise) grammar support
- Updated all dependencies and relocked
- tree-sitter 0.25 support
- Dropped Python 3.9 support
- Adopted prek pre-commit workflow
- CI: cancel superseded workflow runs
- WASM (wast & wat) grammar support
- F# and F# signature grammar support
- tree-sitter-nim grammar support
- tree-sitter-ini grammar support
- Swift grammar update (trailing comma support)
- sdist build issues resolved
- GraphQL grammar support
- Kotlin grammar support (SAM conversions)
- Netlinx grammar support
- Swift grammar update (macros + copyable)
- Apex grammar support
- MSYS2 GCC build issues
- OCaml and OCaml Interface grammar support
- Markdown inline parser support
- Pinned elm and rust grammar versions
- Pinned tree-sitter-tcl to known-good revision
- ARM64 Linux CI builds
- Build issue resolved
- Windows DLL loading compatibility issues
- Windows compatibility and encoding issues for non-English locales
- PyCapsule-based language loading
- Protocol Buffers (proto) grammar support
- SPARQL grammar support
- Updated generation setup and build matrix
- Removed magik and swift grammars (temporarily)
- Version bump with dependency updates
- Added MANIFEST.in for sdist packaging
- Missing parsers in package data
- Initial release with 100+ tree-sitter language grammars
- Python package with pre-compiled parsers
- Multi-platform wheel builds (Linux, macOS, Windows)