Feature/cache vs uncached loading times#352
Conversation
There was a problem hiding this comment.
✅ No Issues Found
X files reviewed | Confidence: 95% | Recommendation: Merge
Review Details
Files: .gitignore, .paige/library_test_progress.json, docs/LIBRARY_STATUS.md, docs/cached_xpath_library_load_times.csv, docs/uncached_xpath_library_load_times.csv, tools/measure_xpath_load_time.py, vibeprolog.py, vibeprolog/interpreter.py
Checked: Security, bugs, performance, error handling, cache invalidation
Summary: The PR implements AST caching to significantly improve library loading performance (5.51s cached vs 28.35s uncached for xpath library). The implementation properly handles cache invalidation using file mtimes and parser configuration signatures. Cache files are stored locally and gitignored. The changes expose some existing library issues (permission errors) that were previously masked by timeouts, but this is expected behavior.
There was a problem hiding this comment.
Code Review
This pull request introduces a caching mechanism for parsed ASTs to improve performance, along with a script to measure and compare cached vs. uncached loading times. The implementation adds a --clear-ast-cache flag to the CLI and a corresponding function to clear both in-memory and on-disk caches. The changes are well-implemented and the error handling for cache clearing is robust. I have one suggestion to improve the maintainability of the measurement script by reducing code duplication.
| uncached_elapsed = _run_xpath_load(repo_root, clear_cache=True) | ||
| _append_result(uncached_log, commit_hash, uncached_elapsed) | ||
| print(f"Commit {commit_hash} uncached load time: {uncached_elapsed:.2f}s") | ||
|
|
||
| cached_elapsed = _run_xpath_load(repo_root, clear_cache=False) | ||
| _append_result(cached_log, commit_hash, cached_elapsed) | ||
| print(f"Commit {commit_hash} cached load time: {cached_elapsed:.2f}s") |
There was a problem hiding this comment.
The logic for running and logging the uncached and cached load times is duplicated. To improve maintainability and adhere to the Don't Repeat Yourself (DRY) principle, you could extract this logic into a helper function. This would make the main function cleaner and reduce the chance of introducing inconsistencies if this logic needs to be changed in the future. The suggestion below introduces a nested helper function to encapsulate the repeated logic, which you could also choose to define at the module level.
| uncached_elapsed = _run_xpath_load(repo_root, clear_cache=True) | |
| _append_result(uncached_log, commit_hash, uncached_elapsed) | |
| print(f"Commit {commit_hash} uncached load time: {uncached_elapsed:.2f}s") | |
| cached_elapsed = _run_xpath_load(repo_root, clear_cache=False) | |
| _append_result(cached_log, commit_hash, cached_elapsed) | |
| print(f"Commit {commit_hash} cached load time: {cached_elapsed:.2f}s") | |
| def _run_and_log(clear_cache: bool, run_type: str, log_path: Path): | |
| """Run a timing measurement, log it, and print the result.""" | |
| elapsed = _run_xpath_load(repo_root, clear_cache=clear_cache) | |
| _append_result(log_path, commit_hash, elapsed) | |
| print(f"Commit {commit_hash} {run_type} load time: {elapsed:.2f}s") | |
| _run_and_log(clear_cache=True, run_type="uncached", log_path=uncached_log) | |
| _run_and_log(clear_cache=False, run_type="cached", log_path=cached_log) |
No description provided.