Skip to content

Replace VectorDB pickle caches with JSON#621

Draft
armorer-labs wants to merge 1 commit into
anthropics:mainfrom
armorer-labs:codex/replace-vectordb-pickle-with-json
Draft

Replace VectorDB pickle caches with JSON#621
armorer-labs wants to merge 1 commit into
anthropics:mainfrom
armorer-labs:codex/replace-vectordb-pickle-with-json

Conversation

@armorer-labs
Copy link
Copy Markdown

Summary

  • replace the evaluation VectorDB pickle caches with JSON caches
  • convert the checked-in classification and text-to-SQL vector DB cache files to JSON
  • update the matching guide notebook snippets so the cookbook examples use the same cache format

Why

These examples only use the cache files as local embedding stores, so JSON is enough for the stored data and avoids teaching pickle.load as the default copy-paste pattern. Existing generated .pkl caches are not loaded automatically; users can regenerate them by rerunning the relevant example setup. The checked-in caches used by the cookbook are converted in this PR.

Related

Validation

  • python -m py_compile capabilities/text_to_sql/evaluation/vectordb.py capabilities/classification/evaluation/vectordb.py capabilities/retrieval_augmented_generation/evaluation/vectordb.py
  • git diff --cached --check
  • confirmed no remaining pickle references in the targeted classification, text-to-SQL, and RAG evaluation/example files with rg
  • ran offline behavior checks with a fake Voyage client covering missing-cache generation, JSON save/reload, shipped cache loading, search results, cached-query reuse, and notebook code parsing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant