-
Notifications
You must be signed in to change notification settings - Fork 10
Delete tars after --non-blocking completes
#383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
eb1e003 to
aa961b4
Compare
--non-blocking completes
78936f2 to
f8fed5e
Compare
f8fed5e to
dafa4df
Compare
|
@TonyB9000 This is not urgent, but when you get a chance, can you try to test this PR as well? It builds on #380 (the first two commits of this PR), which is the higher priority piece to test. This PR attempts to resolve #374. I'm able to remove the bug of deleting the tars in a very small test case, but #374 is concerned with a larger run. In particular, considering that my bug fix is more-or-less focused on addressing final cleanup, I'm a little concerned about this exchange:
My test case: Chrysalis: Perlmutter |
|
@forsyth2 I'll try to find time to test this. If I understand the use-case, we are taring-up local files to create a remote archive, and want to (automatically) delete the local tarfiles after successful transfer (unless "--keep" is supplied). (Side question: When I fetch remote (e.g. NERSC) archive using "zstash check", and the remote tape system first extracts the tarfiles to NERSC disk (for Globus transfer), are the remote tarfiles on NERSC disk auto-deleted?) |
Thanks, I might recommend testing #380 alone first.
Yes, the issue in #374 is that that's not happening. (And importantly, they should be deleted after successful transfer, not all at the very end).
Is that what happens? Shouldn't the Globus transfer go straight from NERSC HPSS to whatever machine you're called the command from? |
That makes sense, yes.
I honestly don't know. Since a tape-drive "streams linearly" (while you can skip and read parts of a disk over and over as it spins) I assumed that the tape spews content to a scratch space on disk, and Globus picks it up for transfer as completed segments are written. If globus missed even a byte, it would have to tell the tape to completely stop, rewind, and try again to pick up missing bytes. I could be wrong, and the tape simply writes to live memory (RAM), which globus reads. It all depends upon how fast the tape drive can dump its content (to memory or to disk), and how fast Globus can read it for network transfer. It the tape was very fast (could spew data faster than network transfer), then a disk can "buffer" the write with more space than RAM might be available. For the auth-test (#380) I want to pull an archive from NERSC to LCRC. Does it need to be "very large"? (>10 TB). In the "olden days" we'd start a Globus transfer that might take 5+ days to complete (so auth/token-expirations become an issue) |
Hmm we might need to ask NERSC support about that one. I'm not sure of the mechanics. Here's what Claude was able to deduce. Importantly:
Full AI responseBased on the search results, I can now provide you with a detailed explanation of how Globus handles transfers from High Performance Storage Systems (HPSS) tape archives. How Globus Handles HPSS Tape Archive TransfersYour team member's understanding is largely correct. Here's how it actually works: HPSS Architecture and Disk Cache HPSS is a hierarchical storage management (HSM) system that uses HSM software to ingest data onto a high-performance disk cache and automatically migrate it to a very large enterprise tape subsystem for long-term retention. The disk cache serves two functions: it buffers data that has to go to tape, and it holds files for a certain period of time so users can access them more quickly. The Transfer Process When Globus transfers data from HPSS:
Tape Ordering and Performance Considerations Your team member's concern about tape streaming is valid. If you are retrieving many (> 100) files from HPSS, you need to order your retrievals so that all files on a single tape will be retrieved in a single pass in the order they are on the tape. If you're retrieving a large data set from HPSS with Globus, files should be read off in tape order to reduce tape wear and overall transfer time. Important Limitations Because of peculiarities of the Globus-HPSS interface, Globus is sometimes not the best tool to use for transfers to or from the HPSS Tape Archive. For instance, Globus transfers involving HPSS cannot resume interrupted transfers so each minor network interruption means starting again from the beginning of the transfer. It also cannot split transfers into multiple streams (a powerful tool for increasing transfer speed) for transfers involving HPSS. Buffer Sizes and Performance With the new arrays, the HPSS disk cache can now retain 30 days' worth of data—roughly 1.5 PB—thus reducing I/O bottlenecks and improving users' productivity. The bigger your cache the better because users are going to get their data sooner. The size of this disk cache is crucial for performance, as it determines how long recently accessed data remains available without requiring tape recalls. So to directly answer your question: Globus doesn't read directly from tape streams. Instead, it relies on HPSS's hierarchical storage management system, which stages requested files from tape to a high-performance disk cache, and then Globus reads from that disk cache. Your team member's intuition about the system "spewing content to a scratch space on disk" is essentially correct - that's exactly what the HPSS disk cache does.
Yes, I'd say so -- large enough at least to get past the point we'd expect a token expiration if using |
Do you have an idea how long that is? Recently, Globus has been fetching about 3.3 TB/hour. So if a token expired in N hours, I must select to transfer more than 3.3N TB of data to effect an expiration. Perhaps there is a way to specify a shorter expiration? (That would be useful for test purposes). |
Hmm, I don't think that would be possible. If we could configure token expiration times, we wouldn't have this problem in the first place (i.e., we could set them longer!) I guess one "hack" for that would be revoke tokens manually mid-transfer:
|
I had considered that, but thought that maybe Globus might treat those thing differently. If a token "naturally" expires during transfer, the default assumption is that the user wanted the transfer to continue. If globus "knows" that the user deliberately revoked a token mid-transfer, the assumption might be that they want to terminate the transfer. I suppose we could experiment with that one more easily (revoke on a 1 hour transfer after 30 minutes, etc). Another possibility (say token expires after 12 hours): authenticate and conduct a small transfer. Wait 11 hours, then initiate a 1 hour transfer. Still not conducive to rapid testing... |
Summary
Objectives:
--non-blockingis setIssue resolution:
Select one: This pull request is...
Small Change