Delete tars after `--non-blocking` completes #383

forsyth2 · 2025-08-14T00:32:13Z

Summary

Objectives:

Delete tars eventually, even if --non-blocking is set

Issue resolution:

Closes [Bug]: tar files are not deleted after successful globus transfer #374

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Small Change

To merge, I will use "Squash and merge". That is, this change should be a single commit.
Logic: I have visually inspected the entire pull request myself.
Pre-commit checks: All the pre-commits checks have passed.

forsyth2 · 2025-08-20T00:44:03Z

@TonyB9000 This is not urgent, but when you get a chance, can you try to test this PR as well? It builds on #380 (the first two commits of this PR), which is the higher priority piece to test. This PR attempts to resolve #374. I'm able to remove the bug of deleting the tars in a very small test case, but #374 is concerned with a larger run. In particular, considering that my bug fix is more-or-less focused on addressing final cleanup, I'm a little concerned about this exchange:

@tangq When you ran into the issue originally (before making the reproduction example), was there only one data transfer and/or were only some of the tars kept? I imagine there might be an edge case where the last few tars in the transfer queue don't get removed.
@forsyth2, actually all tar files were kept

My test case:

Chrysalis:

cd ~/ez/zstash
pre-commit run --all-files
python -m pip install .
cd /lcrc/group/e3sm/ac.forsyth2/zstash_testing
mkdir issue_374_20250819v3
cd issue_374_20250819v3
touch tst.txt
zstash create -v --non-blocking --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/issue374_20250819v3 --maxsize 128 .
ls zstash/
# index.db
# Good, no tar file left

Perlmutter

# ls /global/homes/f/forsyth/zstash/tests/issue374_20250819v3
# 000000.tar
# Good, the tar was transferred

TonyB9000 · 2025-08-20T17:01:19Z

@forsyth2 I'll try to find time to test this. If I understand the use-case, we are taring-up local files to create a remote archive, and want to (automatically) delete the local tarfiles after successful transfer (unless "--keep" is supplied).

(Side question: When I fetch remote (e.g. NERSC) archive using "zstash check", and the remote tape system first extracts the tarfiles to NERSC disk (for Globus transfer), are the remote tarfiles on NERSC disk auto-deleted?)

forsyth2 · 2025-08-20T20:16:13Z

I'll try to find time to test this.

Thanks, I might recommend testing #380 alone first.

If I understand the use-case, we are taring-up local files to create a remote archive, and want to (automatically) delete the local tarfiles after successful transfer (unless "--keep" is supplied)

Yes, the issue in #374 is that that's not happening. (And importantly, they should be deleted after successful transfer, not all at the very end).

(Side question: When I fetch remote (e.g. NERSC) archive using "zstash check", and the remote tape system first extracts the tarfiles to NERSC disk (for Globus transfer), are the remote tarfiles on NERSC disk auto-deleted?)

Is that what happens? Shouldn't the Globus transfer go straight from NERSC HPSS to whatever machine you're called the command from?

TonyB9000 · 2025-08-20T21:04:09Z

they should be deleted after successful transfer, not all at the very end).

That makes sense, yes.

Shouldn't the Globus transfer go straight from NERSC HPSS to whatever machine you're called the command from?

I honestly don't know. Since a tape-drive "streams linearly" (while you can skip and read parts of a disk over and over as it spins) I assumed that the tape spews content to a scratch space on disk, and Globus picks it up for transfer as completed segments are written. If globus missed even a byte, it would have to tell the tape to completely stop, rewind, and try again to pick up missing bytes. I could be wrong, and the tape simply writes to live memory (RAM), which globus reads. It all depends upon how fast the tape drive can dump its content (to memory or to disk), and how fast Globus can read it for network transfer. It the tape was very fast (could spew data faster than network transfer), then a disk can "buffer" the write with more space than RAM might be available.

For the auth-test (#380) I want to pull an archive from NERSC to LCRC. Does it need to be "very large"? (>10 TB). In the "olden days" we'd start a Globus transfer that might take 5+ days to complete (so auth/token-expirations become an issue)

forsyth2 · 2025-08-20T21:26:22Z

I assumed that the tape spews content to a scratch space on disk, and Globus picks it up for transfer as completed segments are written.

Hmm we might need to ask NERSC support about that one. I'm not sure of the mechanics. Here's what Claude was able to deduce. Importantly:

If files are not already in the disk cache, they must first be staged (recalled) from tape to the disk cache.

Once files are staged to the high-performance disk cache, Globus reads from this disk cache rather than directly from tape.

Because of peculiarities of the Globus-HPSS interface, Globus is sometimes not the best tool to use for transfers to or from the HPSS Tape Archive. For instance, Globus transfers involving HPSS cannot resume interrupted transfers so each minor network interruption means starting again from the beginning of the transfer. It also cannot split transfers into multiple streams (a powerful tool for increasing transfer speed) for transfers involving HPSS.

Full AI response

Based on the search results, I can now provide you with a detailed explanation of how Globus handles transfers from High Performance Storage Systems (HPSS) tape archives.

How Globus Handles HPSS Tape Archive Transfers

Your team member's understanding is largely correct. Here's how it actually works:

HPSS Architecture and Disk Cache

HPSS is a hierarchical storage management (HSM) system that uses HSM software to ingest data onto a high-performance disk cache and automatically migrate it to a very large enterprise tape subsystem for long-term retention. The disk cache serves two functions: it buffers data that has to go to tape, and it holds files for a certain period of time so users can access them more quickly.

The Transfer Process

When Globus transfers data from HPSS:

Staging to Disk Cache: HSI has the ability to distinguish whether files are resident on HPSS disk cache or on tape (i.e., purged from disk), and the ability to automatically retrieve files in tape-order, improving performance for stage/get operations. If files are not already in the disk cache, they must first be staged (recalled) from tape to the disk cache.
Globus Reads from Disk Cache: Once files are staged to the high-performance disk cache, Globus reads from this disk cache rather than directly from tape. The buffer space allows data to be collected and then packed together to be efficiently written to tape and the cache spaces holds the most frequently requested files and fields to speed up user read requests.

Tape Ordering and Performance Considerations

Your team member's concern about tape streaming is valid. If you are retrieving many (> 100) files from HPSS, you need to order your retrievals so that all files on a single tape will be retrieved in a single pass in the order they are on the tape. If you're retrieving a large data set from HPSS with Globus, files should be read off in tape order to reduce tape wear and overall transfer time.

Important Limitations

Because of peculiarities of the Globus-HPSS interface, Globus is sometimes not the best tool to use for transfers to or from the HPSS Tape Archive. For instance, Globus transfers involving HPSS cannot resume interrupted transfers so each minor network interruption means starting again from the beginning of the transfer. It also cannot split transfers into multiple streams (a powerful tool for increasing transfer speed) for transfers involving HPSS.

Buffer Sizes and Performance

With the new arrays, the HPSS disk cache can now retain 30 days' worth of data—roughly 1.5 PB—thus reducing I/O bottlenecks and improving users' productivity. The bigger your cache the better because users are going to get their data sooner. The size of this disk cache is crucial for performance, as it determines how long recently accessed data remains available without requiring tape recalls.

So to directly answer your question: Globus doesn't read directly from tape streams. Instead, it relies on HPSS's hierarchical storage management system, which stages requested files from tape to a high-performance disk cache, and then Globus reads from that disk cache. Your team member's intuition about the system "spewing content to a scratch space on disk" is essentially correct - that's exactly what the HPSS disk cache does.

Does it need to be "very large"? (>10 TB). In the "olden days" we'd start a Globus transfer that might take 5+ days to complete (so auth/token-expirations become an issue)

Yes, I'd say so -- large enough at least to get past the point we'd expect a token expiration if using zstash's main branch.

TonyB9000 · 2025-08-20T21:46:34Z

large enough at least to get past the point we'd expect a token expiration

Do you have an idea how long that is? Recently, Globus has been fetching about 3.3 TB/hour. So if a token expired in N hours, I must select to transfer more than 3.3N TB of data to effect an expiration. Perhaps there is a way to specify a shorter expiration? (That would be useful for test purposes).

forsyth2 · 2025-08-20T21:55:27Z

Perhaps there is a way to specify a shorter expiration?

Hmm, I don't think that would be possible. If we could configure token expiration times, we wouldn't have this problem in the first place (i.e., we could set them longer!)

I guess one "hack" for that would be revoke tokens manually mid-transfer:

To start fresh, with no consents: https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"

TonyB9000 · 2025-08-20T22:09:26Z

revoke tokens manually mid-transfer

I had considered that, but thought that maybe Globus might treat those thing differently. If a token "naturally" expires during transfer, the default assumption is that the user wanted the transfer to continue. If globus "knows" that the user deliberately revoked a token mid-transfer, the assumption might be that they want to terminate the transfer. I suppose we could experiment with that one more easily (revoke on a 1 hour transfer after 30 minutes, etc).

Another possibility (say token expires after 12 hours): authenticate and conduct a small transfer. Wait 11 hours, then initiate a 1 hour transfer. Still not conducive to rapid testing...

forsyth2 self-assigned this Aug 14, 2025

forsyth2 added semver: bug Bug fix (will increment patch version) Globus Globus priority: high High priority task labels Aug 14, 2025

forsyth2 added 2 commits August 13, 2025 20:12

Better handle Globus authentications

cb00833

Fix auths and add test

ad42c10

forsyth2 force-pushed the issue-374-tar-deletion branch from eb1e003 to aa961b4 Compare August 14, 2025 01:12

forsyth2 changed the title ~~Issue 374 tar deletion~~ Delete tars after --non-blocking completes Aug 14, 2025

forsyth2 force-pushed the issue-374-tar-deletion branch 2 times, most recently from 78936f2 to f8fed5e Compare August 14, 2025 23:06

Remove global variables from globus.py

dafa4df

forsyth2 force-pushed the issue-374-tar-deletion branch from f8fed5e to dafa4df Compare August 14, 2025 23:21

forsyth2 added 2 commits August 18, 2025 19:42

Tar files not yet deleted

e9de611

Delete remaining transferred files

a24832e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delete tars after `--non-blocking` completes #383

Delete tars after `--non-blocking` completes #383

Uh oh!

forsyth2 commented Aug 14, 2025

Uh oh!

forsyth2 commented Aug 20, 2025

Uh oh!

TonyB9000 commented Aug 20, 2025 •

edited

Loading

Uh oh!

forsyth2 commented Aug 20, 2025

Uh oh!

TonyB9000 commented Aug 20, 2025 •

edited

Loading

Uh oh!

forsyth2 commented Aug 20, 2025

How Globus Handles HPSS Tape Archive Transfers

Uh oh!

TonyB9000 commented Aug 20, 2025 •

edited

Loading

Uh oh!

forsyth2 commented Aug 20, 2025

Uh oh!

TonyB9000 commented Aug 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Delete tars after --non-blocking completes #383

Are you sure you want to change the base?

Delete tars after --non-blocking completes #383

Uh oh!

Conversation

forsyth2 commented Aug 14, 2025

Summary

Small Change

Uh oh!

forsyth2 commented Aug 20, 2025

Uh oh!

TonyB9000 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Aug 20, 2025

Uh oh!

TonyB9000 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Aug 20, 2025

How Globus Handles HPSS Tape Archive Transfers

Uh oh!

TonyB9000 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Aug 20, 2025

Uh oh!

TonyB9000 commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Delete tars after `--non-blocking` completes #383

Delete tars after `--non-blocking` completes #383

TonyB9000 commented Aug 20, 2025 •

edited

Loading

TonyB9000 commented Aug 20, 2025 •

edited

Loading

TonyB9000 commented Aug 20, 2025 •

edited

Loading

TonyB9000 commented Aug 20, 2025 •

edited

Loading