Skip to content

Pickle format upgrade (2 -> 4) in FilesDB#236

Open
ermo wants to merge 1 commit intogetsolus:mainfrom
ermo:ermo/eopkg-suggestions
Open

Pickle format upgrade (2 -> 4) in FilesDB#236
ermo wants to merge 1 commit intogetsolus:mainfrom
ermo:ermo/eopkg-suggestions

Conversation

@ermo
Copy link
Copy Markdown
Contributor

@ermo ermo commented May 3, 2026

Summary

The goal of this PR is to ensure that the pickle format variable for the filesdb shelve is now
set to upstream pickle.DEFAULT_PROTOCOL, such that no refactoring is required for
debugging output.

This makes it upstream's responsibility to ensure pickle format compatibility.

Test Plan

  • Benchmarked this branch against joey's lmdb branch with no issue.
  • Joey said he saw a small speedup going from pickle protocol 2 (py2 compatible) to pickle protocol 4 (py3.12 pickle.DEFAULT_PROTOCOL value):
image

@ermo ermo force-pushed the ermo/eopkg-suggestions branch 2 times, most recently from dd3d7ca to 9904448 Compare May 3, 2026 20:30
@EbonJaeger
Copy link
Copy Markdown
Member

So, after looking more into this and thinking about it for a while, I'm really not sure what we actually gain from this. It doesn't change the default behavior. That variable already wasn't used for anything else like logging.

@ermo ermo marked this pull request as ready for review May 4, 2026 12:24
@ermo
Copy link
Copy Markdown
Contributor Author

ermo commented May 4, 2026

So, after looking more into this and thinking about it for a while, I'm really not sure what we actually gain from this. It doesn't change the default behavior. That variable already wasn't used for anything else like logging.

The gain is the explicitness about what pickle protocol is used in lazydb relative to the filesdb implementation.

My proposal is to make this the lzma_mt "baseline" that gets used by eopkg and ypkg until the rest of the optimisations/simplifications (py-lmdb, fetcher, multiprocessing) are landed.

The development + deployment plan I am proposing is here: https://gist.github.com/ermo/6da8178a1d678b5c5b31c1a5b86829bb

@ermo ermo force-pushed the ermo/eopkg-suggestions branch from 9904448 to c120c38 Compare May 4, 2026 12:44
@EbonJaeger
Copy link
Copy Markdown
Member

The gain is the explicitness about what pickle protocol is used relative to the filesdb implementation.

It really doesn't, though. It's still just going to use the default version. The change literally does nothing vs not having it.

No need to be stuck on the py2-compatible version 2 pickle format,
now that the codebase is py3-only in Polaris.
@ermo ermo force-pushed the ermo/eopkg-suggestions branch from c120c38 to e2c3c0c Compare May 6, 2026 18:04
@ermo
Copy link
Copy Markdown
Contributor Author

ermo commented May 6, 2026

The gain is the explicitness about what pickle protocol is used relative to the filesdb implementation.

It really doesn't, though. It's still just going to use the default version. The change literally does nothing vs not having it.

The commits touching the lazydb stuff have been dropped. The lazydb stuff was only about making the version explicit.

In the present PR, pickle.DEFAULT_PROTOCOL in filesdb (not lazydb) resolves to 4 (py3.12 pickle.DEFAULT_PROTOCOL value), which is an upgrade over the previous 2 (which was the py2-compatible format).

Note also that, in Joey's benchmarks, 4 was slightly quicker than 2:

image

If you search for shelve.open in the full filesdb.py file, you get the following hits that reference the FILESDB_PICKLE_PROTOCOL_VERSION variable:

... and because it is managed by a context manager, that's the version that gets written as well.

@ermo ermo changed the title Pickle format suggestions Pickle format upgrade (2 -> 4) in FilesDB May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants