Add serialized parsing cache for shipped libraries#351
Conversation
There was a problem hiding this comment.
✅ No Issues Found
2 files reviewed | Confidence: 95% | Recommendation: Merge
Review Details
Files: tests/test_module_parsing_cache.py (4 new tests), vibeprolog/interpreter.py (new serialization methods)
Checked: Security, bugs, performance, error handling, cache invalidation logic
Summary: The PR adds on-disk caching for parsed shipped library modules using pickle serialization. Cache is validated by file mtime and parser configuration. Tests cover cache hits, invalidation, and behavior parity. Implementation is robust with proper error handling and security considerations.
There was a problem hiding this comment.
Code Review
This pull request introduces an on-disk cache for parsed library modules, which is a great performance enhancement. The implementation correctly uses file modification times and parser configuration to key the cache, ensuring correctness. The logic for loading from cache, invalidating it, and populating it on a miss seems sound. The new tests are thorough and cover cache hits, invalidation, and parity.
I've added a couple of comments on vibeprolog/interpreter.py:
- A high-severity security note about the use of
picklefor deserialization, which can be a vector for arbitrary code execution if cache files are located in untrusted directories. - A medium-severity suggestion to improve error handling when loading from the cache to avoid silently swallowing all exceptions, which will aid in debugging.
Overall, this is a solid contribution that should improve performance. Addressing the feedback will make it more robust and secure.
|
|
||
| try: | ||
| with cache_file.open("rb") as handle: | ||
| payload: SerializedParsedModule = pickle.load(handle) |
There was a problem hiding this comment.
Loading data with pickle.load from a file can lead to arbitrary code execution if the file's content can be controlled by an attacker. The cache files are stored in .vibe_parsed_cache directories within the library search paths. If any of these library paths are in a world-writable location (e.g., /tmp), it could expose a security vulnerability. Please ensure library paths are in trusted, permission-controlled locations. For greater security, consider using a safer serialization format like JSON, although this may require more work to serialize/deserialize the complex objects.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
Testing
Codex Task