-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Hello,
@ogrisel mentioned in this comment (#385 (comment)):
Anyway, having deterministic pickles is probably out of scope for cloudpickle so I would be in favor of closing this issue.
However, @ogrisel also pushed a PR (#428) which was released as part of cloudpickle 2.0.0 that tried to address non determinism owing to dictionary ordering.
I wanted to confirm what is the official status of the project regarding non determinism because I am still seeing non deterministic pickles in cloudpickle 2.0.0
Here is the pickletools.dis outputs of a function:
...
2539: ( MARK
2540: \x8c SHORT_BINUNICODE 'correlation_matrix'
2560: \x94 MEMOIZE (as 115)
2561: \x8c SHORT_BINUNICODE 'drifts_annual'
2576: \x94 MEMOIZE (as 116)
2577: \x8c SHORT_BINUNICODE 'initial_prices'
2593: \x94 MEMOIZE (as 117)
2594: \x8c SHORT_BINUNICODE 'volatilities_annual'
2615: \x94 MEMOIZE (as 118)
2616: \x91 FROZENSET (MARK at 2539)
2617: \x94 MEMOIZE (as 119)
...
pickle of a function on second attempt:
...
2539: ( MARK
2540: \x8c SHORT_BINUNICODE 'volatilities_annual'
2561: \x94 MEMOIZE (as 115)
2562: \x8c SHORT_BINUNICODE 'initial_prices'
2578: \x94 MEMOIZE (as 116)
2579: \x8c SHORT_BINUNICODE 'drifts_annual'
2594: \x94 MEMOIZE (as 117)
2595: \x8c SHORT_BINUNICODE 'correlation_matrix'
2615: \x94 MEMOIZE (as 118)
2616: \x91 FROZENSET (MARK at 2539)
2617: \x94 MEMOIZE (as 119)
...
As you can see, the entries are all the same, but shuffled around.
This function is part of a large project, so unfortunately I can't produce a short test case right now.
Notice that kubeflow pipelines implement caching by making sure that pickle of the function hasn't changed. (there is an option to not use pickle as well, but it has its own problems). Having a non deterministic cloudpickle invalidates the cache every time making that feature useless.
Thanks.