Non block testing fix #363

TonyB9000 · 2025-02-21T16:10:00Z

Summary

Unifies the non-blocking zstash behavior between both "create" and "update" operations.

Addresses issue #361,

…ing additions for activity tracing

forsyth2

@TonyB9000 I left some initial review comments. I want to spend more time studying the code to understand how everything gets called/passed around though.

forsyth2 · 2025-02-21T21:46:45Z

zstash/create.py

    # Transfer to HPSS. Always keep a local copy.
    logger.debug(f"{ts_utc()}: calling hpss_put() for {get_db_filename(cache)}")
-    hpss_put(hpss, get_db_filename(cache), cache, keep=True)
+    hpss_put(hpss, get_db_filename(cache), cache, keep=args.keep)


This is specifically for archiving the database. I think we do want to always keep that, no?

Yes, I agree. That was a mistake. (But it always seems to remain in any case - a mystery)

forsyth2 · 2025-02-21T22:03:35Z

zstash/create.py

    # (zstash create)
    args: argparse.Namespace = parser.parse_args(sys.argv[2:])
-    if args.hpss and args.hpss.lower() == "none":
+    if not args.hpss or args.hpss.lower() == "none":


Parentheses just for clarity: if (not args.hpss) or (args.hpss.lower() == "none"):

args.hpss args.hpss.lower() == "none" args.non_blocking original behavior new behavior change

T T T args.hpss = "none", args.keep = True args.hpss = "none", args.keep = True N/A

T T F args.hpss = "none" args.hpss = "none", args.keep = True Sets args.keep = True

T F T args.keep = True Nothing No longer sets args.keep = True

T F F Nothing Nothing N/A

F N/A T args.keep = True args.hpss = "none", args.keep = True Sets args.hpss = "none"

F N/A F Nothing args.hpss = "none", args.keep = True Sets args.hpss = "none", args.keep = True

Can you confirm these are the expected changes in behavior?

How did you arrive that the first two rows? Nothing in that code involves the status of "non-blocking".

Correct me if I'm wrong, but testing "if args.hpss" would only fail if the user included no "hpss" argument on the command line. That should be the same as "hpss=none" (unless some hidden config sets it elsewhere - I did not consider that).

In any case, (to my knowledge), the only time we intend to FORCE "keep" is when hpss=none. According the the "help" text, there is nothing that "non-blocking" (True or False) does to effect "keep".

Thus, rows 3 and 4 should not be seeing "keep = True" if the user did not specify keep.

I was looking at the combined behavior of

if args.hpss and args.hpss.lower() == "none": args.hpss = "none" if args.non_blocking: args.keep = True

becoming

if not args.hpss or args.hpss.lower() == "none": args.hpss = "none" args.keep = True

only fail if the user included no "hpss" argument on the command line.

Correct, and I don't think that is possible because we set it as required:

required.add_argument( "--hpss", type=str, help=( 'path to storage on HPSS. Set to "none" for local archiving. It also can be a Globus URL, ' 'globus://<GLOBUS_ENDPOINT_UUID>/<PATH>. Names "alcf" and "nersc" are recognized as referring to the ALCF HPSS ' "and NERSC HPSS endpoints, e.g. globus://nersc/~/my_archive." ), required=True, )

Thus, rows 3 and 4 should not be seeing "keep = True" if the user did not specify keep.

Ok, that makes sense.

forsyth2 · 2025-02-21T22:09:59Z

zstash/globus.py

            return True
    return False

+gv_push = 0


Why gv_push? A more descriptive name might be better. Maybe tar_file_count?

True, but it was just a way for me to track things. We could change it.

I wanted a variable to track "actual transfer submitted" (pushed), as opposed to just submitted to our globus_transfer() function, which may just add it to a pending transfer and return. I'll make it "gv_tarfiles_pushed".

forsyth2 · 2025-02-21T22:11:11Z

zstash/globus.py

        )
    transfer_data.add_item(src_path, dst_path)
-    transfer_data["label"] = subdir_label + " " + filename
+    transfer_data["label"] = label


Note to self: label is defined to be exactly the same thing above already.

forsyth2 · 2025-02-21T22:14:22Z

zstash/hpss.py

+                    for src_path in prev_transfers:
+                        os.remove(src_path)
+                    prev_transfers = curr_transfers
+                    curr_transfers = list()


You can just use = [] instead of = list().

I used to do that - but was cautioned against it (don't recall why). I'd be happy either way.

Hmm interesting, I wonder why. = [] definitely seems more "pythonic" to me, as is echoed on https://stackoverflow.com/questions/5790860/whats-the-difference-between-and-vs-list-and-dict.

forsyth2 · 2025-02-21T22:15:45Z

zstash/update.py

    args: argparse.Namespace = parser.parse_args(sys.argv[2:])
-    if args.hpss and args.hpss.lower() == "none":
+
+    if not args.hpss or args.hpss.lower() == "none":


Parentheses, as in create, would be good: if (not args.hpss) or (args.hpss.lower()) == "none":

True. I was relying upon the default ("not" applies only the the very next argument). Also to the shortcut-pass where testing (A or B) never tests B when A is true, as it is unnecessary (useful when testing B might cause an exception.

I added the parentheses.

Also to the shortcut-pass where testing (A or B) never tests B when A is true, as it is unnecessary (useful when testing B might cause an exception.

Yes, the parentheses are only for human readers. They shouldn't affect the code at all.

forsyth2 · 2025-02-21T22:15:57Z

zstash/update.py

+
+    if not args.hpss or args.hpss.lower() == "none":
        args.hpss = "none"
+        args.keep - True


Ah! That will make a difference! :) Good catch!

TonyB9000 · 2025-02-21T23:25:10Z

@forsyth2 Allow me to make some changes to address the clear mistakes above. Should take just a moment.

forsyth2 · 2025-03-04T01:58:26Z

Allow me to make some changes to address the clear mistakes above. Should take just a moment.

@TonyB9000 Can you push those changes?

I've also reviewed the code logic; this looks good to me, aside from the already suggested changes.

Following the logic of the lists of transferred tars

hpss_utils.add_files -> hpss.hpss_put -> hpss.hpss_transfer:

        if transfer_type == "put":
            if not keep:
                if (scheme != "globus") or (
                    globus_status == "SUCCEEDED"
                ):
                    # Note: This is intended to fulfill the default removal of successfully-transfered
                    # tar files when keep=False, irrespective of non-blocking status
                    logger.info(f"{ts_utc()}: DEBUG: deleting transfered files {prev_transfers}")
                    for src_path in prev_transfers:
                        os.remove(src_path)
                    prev_transfers = curr_transfers
                    curr_transfers = list()

Globus succeeded. We don't have to worry about these tars anymore; they've been transferred.
Delete them and reset the lists.

Earlier in hpss.hpss_transfer, we saw:

curr_transfers.append(file_path)

which is how curr_transfers builds up the list of tars currently being transferred.

Following the logic of `gv_push`

In globus.globus_transfer:

        # DEBUG: review accumulated items in TransferData
        logger.info(f"{ts_utc()}: TransferData: accumulated items:")
        attribs = transfer_data.__dict__
        for item in attribs["data"]["DATA"]:
            if item["DATA_TYPE"] == "transfer_item":
                gv_push += 1
                print(f"   (routine)  PUSHING (#{gv_push}) STORED source item: {item['source_path']}", flush=True)

Increment for every transfer_item we encounter.

In globus.globus_finalize:

    if transfer_data:
        # DEBUG: review accumulated items in TransferData
        logger.info(f"{ts_utc()}: FINAL TransferData: accumulated items:")
        attribs = transfer_data.__dict__
        for item in attribs["data"]["DATA"]:
            if item["DATA_TYPE"] == "transfer_item":
                gv_push += 1
                print(f"    (finalize) PUSHING ({gv_push}) source item: {item['source_path']}", flush=True)

        # SUBMIT new transfer here
        logger.info(f"{ts_utc()}: DIVING: Submit Transfer for {transfer_data['label']}")

Again, increment for every transfer_item we encounter.

gv_push is only ever incremented, never reset to 0. From Tony:

I wanted a variable to track "actual transfer submitted" (pushed), as opposed to just submitted to our globus_transfer() function, which may just add it to a pending transfer and return.

So, gv_push simply counts the number of transfer_items encountered throughout the entire run.

forsyth2 · 2025-03-04T02:07:48Z

We'll also need to fix the pre-commit check before merging.

forsyth2 · 2025-03-06T19:37:37Z

@TonyB9000 Can you please push those changes you mentioned? I can add a commit fixing the pre-commit checks. I'm hoping to merge this today, so I can make a new zstash release candidate. Thanks!

TonyB9000 · 2025-03-06T20:17:08Z

@forsyth2 I will get that done within the next hour. I've finally gotten "zstash check" to behave as expected. I made a small change to the "polling" frequency in the blcck/wait (so it does not fill the log with hundreds of announcements.

Low-disk condition was a factor in earlier failures. We should employ df-check logic to avoid unexpected out-of-disk-space conditions.

forsyth2 · 2025-03-06T20:23:15Z

@TonyB9000 Ok sounds good. Once you push that commit, I'll review the changes and push a commit to fix any pre-commit errors, and then make a zstash RC so @golaz can test in the next Unified RC.

TonyB9000 · 2025-03-06T20:45:50Z

@forsyth2 WHen I push my changes, I get the option:

The upstream branch of your current branch does not match
the name of your current branch.  To push to the upstream branch
on the remote, use

    git push origin HEAD:non-block-testing

To push to the branch of the same name on the remote, use

    git push origin HEAD

I thought I had pushed previously, but may have chosen the wrong option (so you did not see the changes?)

Which should I use? My local branch is named "non-block-testing-fix", but the remote is apparently "non-block-testing".

forsyth2 · 2025-03-06T20:53:07Z

The remote is named non-block-testing-fix too (at the top of this PR page). Try git push origin non-block-testing-fix

TonyB9000 · 2025-03-06T20:57:16Z

OK, that seemed to work.

forsyth2 · 2025-03-06T22:12:46Z

I added 734ea5c. I'm getting a couple errors on the unit tests though:

======================================================================
FAIL: testUpdateCacheHPSS (tests.test_update.TestUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 170, in testUpdateCacheHPSS
    self.helperUpdateCache("testUpdateCacheHPSS", HPSS_ARCHIVE)
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 142, in helperUpdateCache
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: The zstash cache does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

======================================================================
FAIL: testUpdateKeepHPSS (tests.test_update.TestUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 163, in testUpdateKeepHPSS
    self.helperUpdateKeep("testUpdateKeepHPSS", HPSS_ARCHIVE)
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 112, in helperUpdateKeep
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: The zstash cache does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

----------------------------------------------------------------------
Ran 8 tests in 33.143s

FAILED (failures=2)

forsyth2 · 2025-03-06T22:33:40Z

These errors don't appear on main.

pip install . && python -m unittest tests/test_*.py

only gives:

======================================================================
FAIL: testLs (tests.test_globus.TestGlobus)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_globus.py", line 182, in testLs
    self.helperLsGlobus(
  File "/global/u1/f/forsyth/ez/zstash/tests/test_globus.py", line 169, in helperLsGlobus
    self.create(use_hpss, zstash_path, cache=self.cache)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 300, in create
    self.check_strings(cmd, output + err, expected_present, expected_absent)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 187, in check_strings
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: Command=`zstash create --cache=zstash --hpss=globus://6c54cade-bde5-45c1-bdea-f4bd71dba2cc/~/zstash_test/ zstash_test`. Errors=['This was not supposed to be found, but was: ERROR.']

----------------------------------------------------------------------
Ran 69 tests in 415.188s

FAILED (failures=1)

I don't understand how that happened; main was passing the tests when I made zstash v1.4.4rc1.

Actually this error appears on this branch (non-block-testing-fix) too if I run all the tests:

======================================================================
FAIL: testLs (tests.test_globus.TestGlobus)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_globus.py", line 182, in testLs
    self.helperLsGlobus(
  File "/global/u1/f/forsyth/ez/zstash/tests/test_globus.py", line 169, in helperLsGlobus
    self.create(use_hpss, zstash_path, cache=self.cache)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 300, in create
    self.check_strings(cmd, output + err, expected_present, expected_absent)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 187, in check_strings
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: Command=`zstash create --cache=zstash --hpss=globus://6c54cade-bde5-45c1-bdea-f4bd71dba2cc/~/zstash_test/ zstash_test`. Errors=['This was not supposed to be found, but was: ERROR.']

======================================================================
FAIL: testUpdateCacheHPSS (tests.test_update.TestUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 170, in testUpdateCacheHPSS
    self.helperUpdateCache("testUpdateCacheHPSS", HPSS_ARCHIVE)
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 142, in helperUpdateCache
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: The zstash cache does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

======================================================================
FAIL: testUpdateKeepHPSS (tests.test_update.TestUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 163, in testUpdateKeepHPSS
    self.helperUpdateKeep("testUpdateKeepHPSS", HPSS_ARCHIVE)
  File "/global/u1/f/forsyth/ez/zstash/tests/test_update.py", line 112, in helperUpdateKeep
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: The zstash cache does not contain expected files.
It has: ['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']

----------------------------------------------------------------------
Ran 69 tests in 390.687s

FAILED (failures=3)

Ok, I'm going to try to debug this and maybe add some more testing (per #367). We can't make a new zstash RC at the moment.

TonyB9000 · 2025-03-06T22:44:30Z

@forsyth2 I have never run into that error (but I only tested "update" as follows:

zstash create --hpss <the_remote_path> FIRST-set-of-files

(wiped out all local FIRST-set-of-files AND index.db)

zstash update --hpss <the_remote_path> SECOND-set-of-files

(and verified that "remote" contains ALL the files applied)

TonyB9000 · 2025-03-06T22:49:52Z

@forsyth2 There was no overlap between FIRST and SECOND set of files, nor did I try to use the same file(name) with altered content. I was focused only upon the "non-blocking" behavior.

forsyth2 · 2025-03-07T19:25:37Z

I've been trying to play around with this, with no real success so far. A couple things:

The Globus test worked on main after I re-authenticated my NERSC endpoint, but it just hangs on this branch. Indeed, if I do the toy problem setup from How do I run the Globus unit test? #329, it works on main, but hangs on this branch.
The update tests pass on main, but not on this branch:

Test name	Actual files in the cache	Expected files in the cache
testUpdateCacheHPSS	`['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']`	`["index.db"]`
testUpdateKeepHPSS	`['index.db', '000001.tar', '000000.tar', '000002.tar', '000003.tar', '000004.tar']`	`["index.db", "000003.tar", "000004.tar", "000001.tar", "000002.tar"]`

So, the cache test is keeping files unnecessarily and the keep test is keeping 000000.tar unnecessarily (or the expected results need to be updated). (The differing order is fine; the compare function ignores order).

forsyth2 · 2025-03-07T20:38:41Z

@TonyB9000 For the cache test, I notice

        if transfer_type == "put":
            if not keep:
                if (scheme != "globus") or (
                    globus_status == "SUCCEEDED" and not non_blocking
                ):
                    os.remove(file_path)

becomes

        if transfer_type == "put":
            if not keep:
                if (scheme != "globus") or (globus_status == "SUCCEEDED"):
                    # Note: This is intended to fulfill the default removal of successfully-transfered
                    # tar files when keep=False, irrespective of non-blocking status
                    logger.debug(
                        f"{ts_utc()}: deleting transfered files {prev_transfers}"
                    )
                    for src_path in prev_transfers:
                        os.remove(src_path)
                    prev_transfers = curr_transfers
                    curr_transfers = list()

on this branch. That is, now we only remove prev_transfers, whereas before we were deleting the current file_path. This appears to pose a problem when we go to transfer the index.db, since keep is set to True for that file, meaning we never get around to removing prev_transfers.

TonyB9000 · 2025-03-07T20:39:59Z

@forsyth2 I recall complaining that "--keep" itself seemed to work (always Keeping the cache tar files), but when omitted, the behavior was hard to understand - sometimes files would be kept irrespective of the flag. This was true both of "create" and "update". In particular, with non-blocking=True, (where some transfers could involve multiple tarfiles at once), the "SUCCEEDED" reported when submitting a new tar-transfer did not provide which files had previously been transferred successfully, so I could see no mechansim by which they cold be removed.

In blocking mode, this is less a problem, as only ONE tar file is involved in any transfer.

Prior to this branch (and prior to the non-blocking fix, applied to create) tar-files would routinely remain, despite the absence of the _--keep" flag. I could not see a mechanism to conduct the removal reliably.

I wonder of the behavior involves "globus_finalize",

When you say "the cache test" (as opposed to the "keep" test), are you referring to when the user supplies a custom location for the local tar-files with "--cache "?

forsyth2 · 2025-03-07T20:46:08Z

When you say "the cache test" (as opposed to the "keep" test), are you referring to when the user supplies a custom location for the local tar-files with "--cache "?

Yes, I mean the automated test using https://github.com/E3SM-Project/zstash/blob/main/tests/test_update.py#L115 helperUpdateCache.

I wonder of the behavior involves "globus_finalize",

The Globus-specific test is the only automated test for the Globus functionality. That shouldn't be touched in these 2 tests.

forsyth2 · 2025-03-07T21:07:32Z

Ok, I've confirmed the issue isn't related to --cache (error still occurs if I do the same steps without that set).

TonyB9000 · 2025-03-07T21:29:28Z

I mention "globus finalize" because it invokes transfers just as the routine "hpss_transfer" does, but may handle the transfers (external to the globus functionality itself) differently.

I an unclear how the tests https://github.com/E3SM-Project/zstash/blob/main/tests/test_update.py#L115 test the new functionality properly. Nor do II understand how "expected behavior" aligns with what the help-text describes. The table:


    # option | Update | UpdateDryRun | UpdateKeep | UpdateCache | TestZstash.add_files (used in multiple tests)|
    # --hpss    |x|x|x|x|x|
    # --cache   | | | |x|b|
    # --dry-run | |x| | | |
    # --keep    | | |x| |b|
    # -v        | | | | |b|

does not distinguish blocking from non-blocking behaviors.

If the previous version "passed" these tests (properly removing the "expected" tar-files), I need to see where in the actual run codes (not these test drivers) the behavior is manifest.

forsyth2 · 2025-03-07T23:56:32Z

I added a commit (49fd87b) to debug/improve testing, but I've only run into more issues. I made a stand-alone script version of the unit test, and the script seems to work despite paralleling the unit test almost exactly. Unfortunately, I'm going to need to debug more.

I an unclear how the tests https://github.com/E3SM-Project/zstash/blob/main/tests/test_update.py#L115 test the new functionality properly.

Well, they were testing basic functionality and they shouldn't be broken by adding new functionality.

Nor do II understand how "expected behavior" aligns with what the help-text describes.

If keep isn't specified, we should be removing tars from the cache after they transfer. As mentioned above, that seems to work correctly in my stand-alone script version of the failing test, but not in the failing test itself.

The table [...] does not distinguish blocking from non-blocking behaviors.

The table is from the early days of zstash testing. The Globus functionality has significantly complicated testing (and the functional code itself) so much so that I'm seriously considering possible refactorings -- #370, #367/#369 -- to make it easier to understand. Basically, the non-blocking parameter isn't in that table because it's never tested in the unit tests (only stand-alone scripts we've used for testing).

If the previous version "passed" these tests (properly removing the "expected" tar-files), I need to see where in the actual run codes (not these test drivers) the behavior is manifest.

My answer if the unit test is failing appropriately: I believe the code change in #363 (comment) is what is causing this, but I can't be certain. This is another reason why I think a refactoring might be required -- since we always keep the index.db, we need some way to tell zstash to still remove prev_transfers if we're just dealing with index.db. There's a great deal of state that is hard to follow now. (We can't just delete prev_transfers if keep == True and file_name = "index.db" because it could have very well been the case keep has been True all along).

My answer if the unit test itself is broken (i.e., my stand-alone script is correct): in this case, there'd be nothing of note in the run code itself.

forsyth2 · 2025-03-10T22:20:57Z

8050fb5 fixes the zstash update tests, but I'm still debugging the Globus test.

forsyth2 · 2025-03-10T23:52:43Z

f4a661c fixes the Globus test, but importantly I changed the polling interval back to what it was before. @TonyB9000 is this an acceptable change?

I'm also running into a new error when running all the unit tests, but not when I run the extract tests alone. I suspect this was introduced by the second-to-last commit (8050fb5).

======================================================================
FAIL: testExtractParallelHPSS (tests.test_extract_parallel.TestExtractParallel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/u1/f/forsyth/ez/zstash/tests/test_extract_parallel.py", line 120, in testExtractParallelHPSS
    self.helperExtractParallel("testExtractParallelHPSS", HPSS_ARCHIVE)
  File "/global/u1/f/forsyth/ez/zstash/tests/test_extract_parallel.py", line 86, in helperExtractParallel
    self.stop(error_message)
  File "/global/u1/f/forsyth/ez/zstash/tests/base.py", line 143, in stop
    self.fail(error_message)
AssertionError: The tars were printed in this order: ['000000.tar', "'000000.tar']", '000000.tar', '000000.tar', '000000.tar', '000001.tar', "'000001.tar']", '000001.tar', '000001.tar', '000001.tar', "'000000.tar']", '000002.tar', "'000000.tar',", "'000002.tar']", '000002.tar', '000002.tar', '000002.tar', "'000001.tar']", '000003.tar', "'000001.tar',", "'000003.tar']", '000003.tar', '000003.tar', '000003.tar', '000004.tar', "'000004.tar']", '000004.tar', '000004.tar', '000004.tar']
When it should have been in this order: ["'000000.tar',", "'000000.tar']", "'000000.tar']", "'000001.tar',", "'000001.tar']", "'000001.tar']", "'000002.tar']", "'000003.tar']", "'000004.tar']", '000000.tar', '000000.tar', '000000.tar', '000000.tar', '000001.tar', '000001.tar', '000001.tar', '000001.tar', '000002.tar', '000002.tar', '000002.tar', '000002.tar', '000003.tar', '000003.tar', '000003.tar', '000003.tar', '000004.tar', '000004.tar', '000004.tar', '000004.tar']

forsyth2 · 2025-03-11T00:24:22Z

I can reproduce that error with pip install . && python -m unittest tests/test_extrac*.py (running just the parallel tests doesn't seem to cause the error, but running at least two test files does) up until removing 49fd87b. That is, 49fd87b caused this issue... but there are only test changes and logger changes in that commit.

forsyth2 · 2025-03-11T01:14:46Z

It turns out the failing test was relying on reading the tars in a certain order, so the extra logging statements messed that up. I just took those extra statements out -- the changes in 59fa442 are enough to get it passing.

forsyth2

@TonyB9000 I think this is good to merge, but before I merge it, a few questions (see associated comments on this review).

Also, per #363 (comment), is the change at f4a661c#diff-883c2a8c42588679fed46ac7b1d96a0497842c87848bcbf10eb4f1733d357d87 reverting the polling_interval an acceptable change?

forsyth2 · 2025-03-11T01:28:55Z

zstash/globus.py

    return False


+# TODO: What does gv stand for? Globus something? Global variable?


Global Variable. If I must use them, I like to label them as such.

Ok, I think I'm going to expand gv to global_variable then, so it's clear.

Good idea. That might discourage people from using them - should be a standard!

forsyth2 · 2025-03-11T01:29:39Z

zstash/globus.py

    last_task_id = None

    if transfer_data:
+        # DEBUG: review accumulated items in TransferData


This is just a note explaining this code block, right? Not a TODO that still needs to be addressed?

TonyB9000 · 2025-03-11T15:25:06Z

"test was relying on reading the tars in a certain order, so the extra logging statements messed that up" That is weird - but I'd choose to make the comparisons operate over sorted values rather than omit logging messages in general. Maybe these are unnecessary/unhelpful.

TonyB9000 · 2025-03-11T15:43:46Z

@forsyth2 "I changed the polling interval back to what it was before". Yes, that is OK. I was testing whether it was thae cause of my seeing 120+ "success" messages in log output, which seemed to be merely reflecting that the polling interval has been reached.

I would like to refactor/merge both "globus_wait()" and "globus_block_wait()", once we have a solid sense of the desired behavior. I have seen various examples of using task_wait and they are often confusing regarding the relationship between "timeout" and "polling_interval". One behavior I want to avoid is hanging-forever if the transfer itself hangs (returns "ACTIVE" forever.) Hence the timeout-retries code. But then, how to make it large enough when some transfers can take days?

TonyB9000 · 2025-03-11T15:47:29Z

@forsyth2 I would like (eventually) to have (input) path be an added (optional) parameter for "update", rather than force the user to operate in the source-file directory. It is inconsistent with "create", where you can operate in directory X but load files from directory Y.

forsyth2 · 2025-03-11T17:04:29Z

"test was relying on reading the tars in a certain order, so the extra logging statements messed that up" That is weird - but I'd choose to make the comparisons operate over sorted values

The issue is that we're testing command line functions, not Python functions. So basically all the "unit" tests ("unit" in quotes because they rely on the system to run and are thus really integration tests) are just checking all output printed to the command line by a command. So, if there are log statements printing out more things, the unit tests can be fooled by earlier output.

"I changed the polling interval back to what it was before". Yes, that is OK.

Ok, great!

I would like to refactor/merge both "globus_wait()" and "globus_block_wait()"
I would like (eventually) to have (input) path be an added (optional) parameter for "update"

Yes, these issues + my comment above about "unit" tests + issues noted on #370 all point to a major refactor being needed. The codebase has become unwieldy to work with, with logic that is hard to follow & test.

Our team is going to have a meeting to plan out the next release once we get this Unified release done. I think as part of that we need to budget time for both 1) figuring out what a zstash refactor would even look like and 2) actually implementing that once decided.

And I think that this refactor design & implementation should be done in tandem with resolving #339 (we're going to need to be thinking about Globus fixes as part of the refactor anyway).

TonyB9000 · 2025-03-11T17:41:49Z

@forsyth2 On refactoring zstash/globus: Recall that I have a python workflow (dsm_manage_CMIP_production) that operates "CMIP-dataset-at-a-time" (conditionally zstash-extracting new native data from a local cache-archive when needed, and conditionally fetching a remote archive as needed, etc). This routine will "inherit" the credential-expiration issues of zstash/globus. I have striven to make my codes sufficiently "stateful" that an exit and restart can automatically pick-up where it left off. to avoid unnecessary re-do of efforts. Just something to keep in mind.

I am thinking, any globus transfer that lasts more than 48 hours would certainly involve multiple tar-files, so if there were a way to track per-tar-file completion, the tool should be able to pick-up on a restart and transparently continue a broken set of transfers.

Anthony Bartoletti added 2 commits February 20, 2025 18:15

addressed non-blocking behavior for both create and update, many logg…

c1158ee

…ing additions for activity tracing

Reset maxsize to production value

86fbbd5

TonyB9000 requested a review from forsyth2 February 21, 2025 16:10

forsyth2 reviewed Feb 21, 2025

View reviewed changes

Anthony Bartoletti added 2 commits February 21, 2025 17:40

fixed wrong keep value and typo, renamed pushcount variable

09ac71d

added parentheses for logic clarity

ab7e6f9

adjust block wait polling interval

be08d8b

Clean up code

734ea5c

forsyth2 mentioned this pull request Mar 7, 2025

Refactor zstash #370

Draft

Add update tests

49fd87b

Update tests passing

8050fb5

Globus test passing

f4a661c

Remove unused variable

33ef848

Fix logging statements

59fa442

forsyth2 reviewed Mar 11, 2025

View reviewed changes

forsyth2 force-pushed the non-block-testing-fix branch from ad11dd9 to dd227ca Compare March 11, 2025 17:30

Rename gv to global_variable

78476eb

forsyth2 force-pushed the non-block-testing-fix branch from dd227ca to 78476eb Compare March 11, 2025 17:30

forsyth2 merged commit c96e591 into main Mar 11, 2025
3 checks passed

forsyth2 deleted the non-block-testing-fix branch March 11, 2025 17:32

forsyth2 mentioned this pull request Mar 17, 2025

[Bug]: zstash update --non-blocking not behaving as expected #361

Closed

This was referenced Jun 5, 2025

[Doc]: Remove "NOTE: zstash is currently always non-blocking." #375

Open

[Bug]: tar files are not deleted after successful globus transfer #374

Open

forsyth2 mentioned this pull request Oct 4, 2025

[Feature]: Improve zstash testing #385

Open

`args.hpss`	`args.hpss.lower() == "none"`	`args.non_blocking`	original behavior	new behavior	change
T	T	T	`args.hpss = "none"`, `args.keep = True`	`args.hpss = "none"`, `args.keep = True`	N/A
T	T	F	`args.hpss = "none"`	`args.hpss = "none"`, `args.keep = True`	Sets `args.keep = True`
T	F	T	`args.keep = True`	Nothing	No longer sets `args.keep = True`
T	F	F	Nothing	Nothing	N/A
F	N/A	T	`args.keep = True`	`args.hpss = "none"`, `args.keep = True`	Sets `args.hpss = "none"`
F	N/A	F	Nothing	`args.hpss = "none"`, `args.keep = True`	Sets `args.hpss = "none"`, `args.keep = True`

		return False


		# TODO: What does gv stand for? Globus something? Global variable?

Non block testing fix #363

Non block testing fix #363

Uh oh!

Conversation

TonyB9000 commented Feb 21, 2025

Summary

Uh oh!

forsyth2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TonyB9000 commented Feb 21, 2025

Uh oh!

forsyth2 commented Mar 4, 2025

Uh oh!

forsyth2 commented Mar 4, 2025

Uh oh!

forsyth2 commented Mar 6, 2025

Uh oh!

TonyB9000 commented Mar 6, 2025

Uh oh!

forsyth2 commented Mar 6, 2025

Uh oh!

TonyB9000 commented Mar 6, 2025

Uh oh!

forsyth2 commented Mar 6, 2025

Uh oh!

TonyB9000 commented Mar 6, 2025

Uh oh!

forsyth2 commented Mar 6, 2025

Uh oh!

forsyth2 commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyB9000 commented Mar 6, 2025

Uh oh!

forsyth2 commented Mar 7, 2025

Uh oh!

forsyth2 commented Mar 7, 2025

Uh oh!

TonyB9000 commented Mar 7, 2025

Uh oh!

forsyth2 commented Mar 6, 2025 •

edited

Loading

TonyB9000 commented Mar 6, 2025 •

edited

Loading

forsyth2 commented Mar 10, 2025 •

edited

Loading

forsyth2 commented Mar 11, 2025 •

edited

Loading

TonyB9000 commented Mar 11, 2025 •

edited

Loading