Conversation
|
Xylar had noted:
So, for the Spring release (tentatively planned as: testing period beginning April 1, release on May 1), we'd want to support four versions of Python: 3.11, 3.12, 3.13, 3.14. Tom also noted:
|
|
Unlike Claude's response: Looking at this GitHub Actions log for the zstash project running with Python 3.14, the main issue is a pickling error with SQLite cursor objects in multiprocessing. The Core ProblemThe tests are failing with this error: What's HappeningThe code is trying to use Python's Why This Affects Python 3.14Python 3.14 changed the default multiprocessing start method on Unix from The Failing TestsAll failures are in parallel execution tests:
The FixThe code needs to be updated to avoid passing SQLite cursor objects to worker processes. Instead, it should:
This is a compatibility issue introduced by Python 3.14's stricter multiprocessing requirements, not a bug in Python itself. |
|
This could end up more involved. The second commit still fails. Claude's analysis: Looking at this test failure log, the issue is clear: the tests are failing on Python 3.14 due to a multiprocessing serialization error. The key error appears multiple times: TypeError: cannot pickle 'weakref.ReferenceType' object
when serializing zstash.parallel.PrintMonitor object
when serializing zstash.parallel.ExtractWorker objectThis happens specifically when trying to use the
Why this is happening: Python 3.14 has changed how multiprocessing serializes objects. The What changed in Python 3.14: Python 3.14 made the default multiprocessing start method The fix: The zstash code needs to be updated to handle Python 3.14's stricter serialization requirements. Options include:
The test is correctly detecting this incompatibility - it's not a test issue, but a genuine compatibility problem with the parallel processing code in Python 3.14. |
d619ff1 to
1f1e8e1
Compare
|
I went through quite a few iterations with Claude and don't feel particularly closer to a solution. It's good we caught this well ahead of the Spring release (when we want to add Python 3.14 support). Important note from Claude's summary below:
That is, Claude's full summary: Python 3.14 Multiprocessing Output Race ConditionThe ProblemParallel extract tests pass on Python 3.11-3.13 but fail on Python 3.14 with output appearing in the wrong order, even though the code logic enforces correct sequencing. Why Python 3.11-3.13 WorkedIn Python 3.11-3.13, there was implicit synchronization that happened to preserve output ordering when multiple processes wrote to stdout:
This was accidental correctness - the code had a race condition, but Python's implementation details masked it. What Changed in Python 3.14Python 3.14 introduced several changes that exposed the race condition:
Why Our Fixes Didn't WorkWe tried multiple approaches, all of which failed:
The Fundamental Issuestdout is not atomic across processes in Python 3.14. Even when a single process (main) controls the printing order, the OS and Python's IO subsystem can still interleave the output when it's being captured/redirected. This is especially true when:
Why This Is Hard To FixThe only true solution would be to avoid stdout entirely and use a different IPC mechanism (like writing to a file with locks, or using a proper message queue library). But this would be a significant architectural change for what is fundamentally a Python 3.14 platform issue, not a logic bug in the code. RecommendationSkip these specific parallel output-ordering tests on Python 3.14 until:
The actual functionality works fine - files are extracted correctly, in parallel, with proper synchronization. Only the output ordering verification fails. |
|
Claude's description of the 3 commits from 2026-01-26 (4f172d5, 3f19ca5, 9f2692d) can be found below. The tests run via I still need to run the rest of the test suite though -- notably I see the GitHub Actions have caught another issue. Here's a concise commit message: Fix Python 3.14 multiprocessing output ordering race conditionPython 3.14's improved parallelism (PEP 703 free-threading) exposed a race condition in parallel extraction where output could arrive out of order. Key changes:
Flow:
This ensures ordered output in Python 3.14 while maintaining the same performance as Python 3.11-3.13. |
|
I've run the Chrysalis tests and they also pass using the latest commit. The only test I haven't run is the Globus authentication test, because it requires NERSC HPSS, which is down today (see NERSC Center Status). That said, there weren't any Globus changes in this PR, so I don't anticipate any issues there. |
|
Review guide from Claude: Output from ClaudeCode Review Guide: Python 3.14 Support PROverviewThis PR adds Python 3.14 support to zstash. Python 3.14's changes to multiprocessing (PEP 703 free-threading, improved parallelism) exposed race conditions in output ordering that were masked in 3.11-3.13 by GIL-based implicit synchronization. ✅ Low-Risk Changes (Quick Review)Version Updates - Straightforward:
Simple Bug Fix:
|
|
After some testing the branch at various stages, Claude was able to generate this summary table: Output from Claude
Key insight from Case 6: Removing all the verbose debug logging reveals the actual problem—the test is capturing output from two consecutive extraction runs (both the first Yes, the major refactor in Case 4 was necessary. Here's why: The Root Problem (revealed by Case 6): Why Cases 3, 5, and 6 all fail:
Why Case 4 succeeds:
Simply removing debug messages (Cases 5 & 6) only treated symptoms. The refactor in Case 4 addressed the architectural issue of how parallel workers coordinate their console output and/or how the test captures that output. Conclusion: Yes, the major refactor was necessary because the problem wasn't just verbose logging—it was a fundamental issue with output ordering/capture in the parallel execution model. Looking at the test file, I need to revise my previous answer. The test structure shows there are actually TWO separate
Both runs call Revised Understanding: The Root Problem:
But they appear interleaved: tar 0 (twice), then tar 1 (twice), then tar 2 (twice)... instead of all tar 0 mentions grouped together. Why Cases 3, 5, and 6 fail:
Each run individually has the wrong ordering! The workers are processing tars concurrently but their output isn't properly serialized. Why Case 4 succeeds: Conclusion: Yes, the major refactor was necessary. The problem wasn't about capturing two runs—that's intentional test design. The problem was that each individual parallel extraction run wasn't maintaining sequential tar ordering in its output, and the refactor fixed that core parallelization/synchronization issue. |
|
It turns out that case 3 was in fact simply failing because of the added debug lines. Two important lessons:
|
9ac2940 to
976397c
Compare
|
I've updated the commits. I removed the most recent 4 commits and added 2 commits: 137c101 removes extra write lines, 976397c fixes tar extraction (this is the same change as the 4th commit that was removed). The Chrysalis tests are passing. Remaining action items:
|
976397c to
3cdcdd3
Compare
3cdcdd3 to
734f200
Compare
|
I was able to remove more extraneous changes by reviewing the point in the commit history where the debug lines began interfering with the ordering tests. The 3 commits now are:
Commits 1 & 3 are known to be necessary. If any extraneous changes remain, they'd be in commit 2. |
|
Chrysalis tests pass as well. |
|
I think I've removed the over-engineering aspects. Remaining action items:
Claude's summary guide: Summary: Python 3.14 Compatibility & Deadlock FixThe ProblemTwo distinct issues blocking Python 3.14 support:
The Solution1. Database Connection ManagementBefore: Main process passed shared # In extractFiles():
if cur is None: # Worker process
con = sqlite3.connect(get_db_filename(cache))
cur = con.cursor()
close_at_end = True2. Synchronization SimplificationBefore: Complex flow with multiple wait points After: Single wait at start, then process entirely Key changes:
3. Defensive Measures
Implementation Details
|
To confirm this further, I went back through my Claude conversations from November. Re-applying the diffs confirms the following as the debugging workflow:
To summarize, it appears everything in commit 2 here really was necessary to 1) get all tests passing on Python 3.14 and 2) not cause deadlocks. |
| workers_to_tars: List[set] = [set() for _ in range(num_workers)] | ||
| # A min heap, of (work, worker_idx) tuples, work is the size of data | ||
| # that worker_idx needs to work on. | ||
| # We can efficiently get the worker with the least amount of work. |
There was a problem hiding this comment.
I suppose this is the root cause of the ordering issue -- we were adding tars based on the least busy workers. But 1) Why did it work before Python 3.14? and 2) is the new round-robin fashion going to impact efficiency?
There was a problem hiding this comment.
Response from Claude based on the diff
Looking at your diff, I can address your three questions:
1) Is this over-engineered?
Yes, portions are over-engineered for what appears to be the core issue. The fundamental problem seems to be process synchronization during parallel extraction, but some of your changes add complexity that may not be necessary:
- Timeout handling with multiple fallback paths - You've added timeouts at several levels (wait_turn, print_all_contents, the main loop). This suggests you're treating symptoms rather than root causes.
- The complete rewrite of PrintMonitor - Switching from condition variables to a polling-based index system is actually simpler, not more complex, but the extensive timeout handling suggests the synchronization approach might still have issues.
- Delayed logging setup - Moving logging setup inside the tar processing loop adds complexity. This seems like a workaround for ordering issues rather than a fix.
2) Will round-robin reduce efficiency vs. heap?
Yes, potentially, but probably not significantly in practice.
The heap approach was trying to balance work across workers by assigning tars based on their size - giving the next largest tar to the worker with the least work. This optimizes for work balance.
Round-robin just cycles through workers regardless of tar size, which could lead to imbalanced workloads if tar sizes vary significantly (e.g., Worker 1 gets all small tars, Worker 2 gets all large tars).
However, the efficiency loss is likely minor because:
- Your ordering constraint requires processing tars sequentially for output anyway
- Workers that finish early just wait, so perfect load balancing doesn't help much
3) Why did the heap method pass ordering tests before?
This is the key question. The heap method probably passed because:
- Tests may have had uniform tar sizes - If all tars were similar sizes, heap and round-robin would produce similar assignments
- The synchronization logic worked correctly in the old Python version - The condition variable approach in the original
PrintMonitorlikely worked fine with Python 3.11-3.13's multiprocessing implementation - Race conditions were rare/didn't manifest - The bugs may have existed but only triggered under specific timing conditions that Python 3.14's multiprocessing changes exposed
The real issue: Python 3.14 likely changed something in the multiprocessing implementation (perhaps how condition variables, queues, or manager proxies work) that broke the synchronization. Your changes work around this by:
- Using simpler primitives (index counter vs. queue + condition variable)
- Adding timeouts everywhere to prevent hangs
- Simplifying the work assignment to make ordering more predictable
Better approach: Instead of round-robin + extensive timeout handling, consider:
- Keeping round-robin (simpler is better given the ordering constraint)
- Investigating what specifically changed in Python 3.14's multiprocessing
- Potentially using a more robust synchronization primitive (like a
BarrierorSemaphoreinstead of polling an index) - Removing the timeout fallbacks once proper synchronization is achieved
The fact that you needed timeouts suggests the synchronization still isn't quite right - a well-designed synchronization mechanism shouldn't need timeouts to prevent deadlocks.
There was a problem hiding this comment.
I've worked with Claude to reduce over-engineering further. Will add another commit.
There was a problem hiding this comment.
Here are explanatory comments from Claude re: the PR as these 5 diffs:
> git log --oneline | head -n 6
0ebd7a5 Restore comments and type annotations
f33ea80 Claude's changes to reduce over-engineering
734f200 Fixes to make Perlmutter tests pass
a54c842 Implementation changes to support Python 3.14
e315362 Add Python 3.14 support
880b9fd Optimize zstash update by using scandir (#420)
All tests pass on Perlmutter. I still need to check on Chrysalis, but based on previous iterations, I don't expect issues there.
EDIT: I've also tested on Chrysalis now.
Summary: The main theme is replacing complex load-balancing + synchronization with simple, predictable ordering. This eliminates deadlocks at the cost of potentially less optimal work distribution (but for most workloads, round-robin is fine).
| uses: actions/checkout@v3 | ||
|
|
||
| - name: Set up Python 3.13 | ||
| - name: Set up Python 3.14 |
There was a problem hiding this comment.
Python 3.14 Support
Files: .github/workflows/build_workflow.yml, conda/dev.yml, setup.cfg, setup.py
- Bumping Python version support from 3.13 to 3.14 across all configuration files
- Adding
fail-fast: falseto the build matrix so all Python versions get tested even if one fails
| import sqlite3 | ||
| import sys | ||
| import tarfile | ||
| import time |
There was a problem hiding this comment.
Import Changes
File: zstash/extract.py
- Removed
heapqimport (was used for load balancing workers, no longer needed) - Added
timeimport for sleep delays in the parallel processing loop
| cur: sqlite3.Cursor, | ||
| args: argparse.Namespace, | ||
| multiprocess_worker: Optional[parallel.ExtractWorker] = None, | ||
| cur: Optional[sqlite3.Cursor] = None, |
There was a problem hiding this comment.
Database Connection Fix for Parallel Workers
Function: extractFiles() - signature change
- Swapped order of
multiprocess_workerandcurparameters - Made
curoptional (defaults toNone) - Why: Each worker process needs its own database connection. SQLite connections can't be shared across processes. When
cur=None, each worker opens its own connection and closes it when done.
|
|
||
| # For worker i, workers_to_tars[i] is a set of tars | ||
| # that worker i will work on. | ||
| # Round-robin assignment for predictable ordering |
There was a problem hiding this comment.
Simplified Work Distribution
Function: multiprocess_extract() - worker assignment logic
- Old approach: Used a min-heap to assign tars to workers based on total file size (trying to balance workload)
- New approach: Simple round-robin assignment (tar 0 → worker 0, tar 1 → worker 1, etc.)
- Why the change: The heap-based approach was complex and could result in unpredictable processing order, leading to deadlocks when workers waited for their turn to print. Round-robin is predictable and prevents deadlocks.
| """ | ||
|
|
||
| def __init__(self, tars_to_print: List[str], *args, **kwargs): | ||
| def __init__(self, tars_to_print: List[str], manager=None, *args, **kwargs): |
There was a problem hiding this comment.
Print Synchronization Redesign
Class: PrintMonitor
- Old approach: Used a queue of tars and a condition variable with complex locking
- New approach: Uses a simple counter tracking position in an ordered list of tars
- Why: The condition variable approach had race conditions causing deadlocks. The counter-based approach is simpler: each tar has an index, and we just increment the counter when done.
Key methods:
wait_turn(): Worker spins (with small sleep) until the counter reaches its tar's indexdone_enqueuing_output_for_tar(): Increments the counter to let the next worker proceed
| # let the process know. | ||
| # This is to synchronize the print statements. | ||
|
|
||
| # Wait for turn before processing this tar |
There was a problem hiding this comment.
Extraction Loop Changes
Function: extractFiles() - main loop
- Added
wait_turn()call before processing each tar - ensures workers process tars in order - Moved
print_all_contents()to after marking tar as done - ensures all output is flushed before advancing - Added sleep in the monitoring loop (
time.sleep(0.01)) - prevents busy-waiting and gives workers CPU time - Added extra drain of failure queue after processes finish - catches any failures that arrived just as processes were ending
|
|
||
| elif extract_this_file: | ||
| tar.extract(tarinfo) | ||
| if sys.version_info >= (3, 12): |
There was a problem hiding this comment.
Python 3.12+ tar.extract() Security
Function: extractFiles() - tar extraction
- Added version check to use
filter="tar"for Python 3.12+ - Why: Python 3.12 deprecated calling
tar.extract()without a filter due to security concerns about malicious tar files
|
This PR adds Python 3.14 support and adds implementation changes to keep expected behavior while using Python 3.14. Most importantly, there are large changes to the parallelism workflow, notably replacing the heap method with the round-robin method (which Claude claims won't impact performance). These changes were ultimately necessary to keep the output ordering tests passing and to avoid deadlocks. It seems that Python 3.14 has changed some multiprocessing handling. I've put explanations from Claude on relevant code blocks above (see here). @golaz -- Do you think this change acceptable? It might be more efficient to go over this PR together at the next EZ discussion. @chengzhuzhang @tomvothecoder @andrewdnolan -- cc'ing you all since this PR resolves the issue of supporting Python 3.14 for the upcoming E3SM Unified release. Please let me know if you have any suggestions/concerns. Thanks all! |
| if [[ "${{ matrix.python-version }}" == "3.12" ]] || [[ "${{ matrix.python-version }}" == "3.13" ]] || [[ "${{ matrix.python-version }}" == "3.14" ]]; then | ||
| python -m ensurepip --upgrade || true | ||
| python -m pip install --upgrade --force-reinstall pip setuptools wheel | ||
| fi |
There was a problem hiding this comment.
I'm pretty sure if you include setuptoolsin the dev.yml then you don't need this block.
Example: https://github.com/xCDAT/xcdat/blob/8238fab6bbcbcc25f8dc67b4cbe753ab6ba7edfc/conda-env/dev.yml#L8-L9
Summary
Important: this should be merged to
mainafter the production release ofzstash v1.5.0.Objectives:
Issue resolution:
Select one: This pull request is...
Small Change