Skip to content

Conversation

@forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Jul 30, 2025

Summary

Objectives:

  • Isolate code changes from Refactor zstash #370 that specifically fix the Globus authentications, and build on those changes, if necessary.

Issue resolution:

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Big Change

  • To merge, I will use "Create a merge commit". That is, this change is large enough to require multiple units of work (i.e., it should be multiple commits).

1. Does this do what we want it to do?

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have added at least one automated test. Every objective above is represented in at least one test.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

  • Testing: this pull request adds at least one new possible command line option. I have tested using this option with and without any other option that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zstash/conda, not just an import statement.

3. Is this well documented?

Required:

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

@forsyth2 forsyth2 self-assigned this Jul 30, 2025
@forsyth2 forsyth2 added semver: bug Bug fix (will increment patch version) Globus Globus semver: small improvement Small improvement (will increment patch version) labels Jul 30, 2025
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TonyB9000 I've isolated code specific to Globus auths from the refactor PR (#370) to here. (I still think the refactor is important for code clarity & future development, but it is perhaps most useful to pull out this bug fix into a separate PR)

My most recent run, using the code on this branch
cd ~/ez/zstash
lcrc_conda # My function to activate conda locally
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-339-20250730
conda activate zstash-339-20250730
pre-commit run --all-files
python -m pip install .

# Try 1: Using whatever state my auths happened to be in
cd ../
mkdir zstash_test339_20250730_try1
cd zstash_test339_20250730_try1
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
# >>> Log into NERSC, paste auth code once
# Worked!


# Try 2: Completely clean slate: no globus cfg, no globus consents
rm ~/.globus-native-apps.cfg
cd ../
mkdir zstash_test339_20250730_try2
cd zstash_test339_20250730_try2
# globus.org > File Manager > select "LCRC Improv DTN", "NERSC Perlmutter"
# https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg does not exist. zstash will need to prompt for authentications twice, and then you will need to re-run.
# >>> Log into Argonne, NERSC login prompt (but it remembers credentials), paste auth code
# Worked!

# Try 3: Try again with the new auth state
cd ../
mkdir zstash_test339_20250730_try3
cd zstash_test339_20250730_try3
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
# >>> No prompt to log into Argonne/NERSC, paste auth code once
# Worked!

# Try 4: Try again from the same directory, as if try 3 was the "toy problem"
cd zstash_test339_20250730_try3
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
# >>> No prompt to log into Argonne/NERSC, paste auth code once
# Worked! But still needed the auth code!!

Referring to the list of Globus problems on #339:

  1. Login to Globus web interface and activate end points -- This was always needed.
  2. Delete existing globus cfg file -- it really doesn't seem to matter if we delete this or not, the current code will always ask for exactly one auth code. It does look like it's almost never in the correct configuration, so maybe we can just always auto-delete it before each try.
  3. Start interactive zstash test transfer -- Now skippable
  4. Start a second interactive zstash transfer -- Now skippable
  5. Start long transfer -- Got better: we only ever have to paste the auth code once now and we never have to do a "toy problem" first before running what we really want to run. Got worse: we appear to always need an auth code (this will pose a problem for situations where we run a toy problem before running a script that calls zstash: the script would now hang waiting for an auth code). (Basically, now instead of entering an auth code 2 times or 0 times, it's always 1 time).
  6. Using Globus web interface, manually transfer zstash files that were not transferred due to token expiration. -- I don't think this is resolved...
  7. Repeat steps (2) to (4) above. Restart zstash archiving that stopped

@forsyth2
Copy link
Collaborator Author

(Basically, now instead of entering an auth code 2 times or 0 times, it's always 1 time).

Claude suggests:

This is actually a security feature - Globus wants explicit user consent for accessing specific endpoints, even if you have general transfer permissions.
The "always need auth code once per session" behavior is likely intentional for this type of endpoint-specific access pattern.

@TonyB9000
Copy link
Collaborator

TonyB9000 commented Jul 30, 2025

I'd want to ask Claude, in the context of process automation involving somewhat arbitrary demands for invoking new globus transfers in a long-running control process, does this Globus security feature (requiring explicit user consent for accessing specific endpoints in each session) make such automation impossible?

Alternately, is it feasible to set up an automated (synthetic human) interaction to facilitate automation.

@forsyth2 forsyth2 mentioned this pull request Aug 5, 2025
7 tasks
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TonyB9000 Ok, I think I've finally got a working version! This actually allows users to:

  1. Authenticate on the first new run (no toy problem needed)
  2. Do future runs without any further authentications.

This can be replace a number of PRs/commits, and resolve a number of open issues:

  • This PR moves the changes from the prototype script of #382 into production code (i.e., the actual zstash code base).
  • This PR replaces the commits 9edaf6f and 747feb2 in #370. Again, I still want to merge the refactor to clean up the code base, but I think it makes sense to pull out these auth changes into their own PR so we can test with more modularity.
  • See inline comments for more issue/PR resolutions.

I still need to check how the existing unit tests run after all these changes, but my test of the problematic functionality has been successful.

Comment on lines +151 to +154
client.oauth2_start_flow(
requested_scopes=all_scopes,
refresh_tokens=True, # This is the key to persistent auth!
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addresses #338 -- the oauth2_start_flow comes from https://globus-sdk-python.readthedocs.io/en/stable/examples/native_app.html#with-refresh-tokens.

Relatedly, I believe this is the Globus Flow that @rljacob mentioned earlier today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this oauth2_start_flow may also resolve the (5) Start long transfer [...] Note that this transfer is limited to 48 hours due to Globus token expiration. and (6) Using Globus web interface, manually transfer zstash files that were not transferred due to token expiration. of #339.

(Previously, that is, on main, we were using native_client.login(no_local_server=True, refresh_tokens=True), so my hope is that the oauth2_start_flow is nicer to use).

But @TonyB9000 I think I'd need you to test on a long run to be sure that this is in fact the case.

Copy link
Collaborator

@TonyB9000 TonyB9000 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 This is great. Hopefully (using the new CMIP6 production workflow) we only need to fetch an archive "as-needed", as opposed to fetching them all at once - so each fetch should last less than a day - I hope.

(I still need to understand how the last fetch (wlin's NERSC directory) failed. I'll try it again soon.)

from typing import Dict, List, Optional

from globus_sdk import (
NativeAppAuthClient,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're using the Globus client instead of the fair_research_login one, which means 1) it will be easier to ask the Globus support team for help in the future, and 2) we can close #253 as irrelevant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is AWESOME. Excellent work Ryan.

Comment on lines +19 to +39
check_log_has()
{
local expected_grep="${1}"
local log_file="${2}"
grep "${expected_grep}" ${log_file}
if [ $? != 0 ]; then
echo "Expected grep '${expected_grep}' not found in ${log_file}. Test failed."
exit 2
fi
}

check_log_does_not_have()
{
local not_expected_grep="${1}"
local log_file="${2}"
grep "${not_expected_grep}" ${log_file}
if [ $? == 0 ]; then
echo "Not-expected grep '${expected_grep}' was found in ${log_file}. Test failed."
exit 2
fi
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out a similar testing setup for #378. I think these bash tests with grep checks are a better testing method for zstash than the existing "unit" tests (which in reality are closer to integration tests, as they do quite a few system calls with tests/base.run_cmd...)

Comment on lines +53 to +57
if os.path.exists(GLOBUS_CFG):
logger.warning(
f"Globus CFG {GLOBUS_CFG} exists. This may be left over from earlier versions of zstash, and may cause issues. Consider deleting."
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to worry about GLOBUS_CFG, resolving (2) Delete existing globus cfg file from #339

Comment on lines +133 to +141
if "refresh_token" in token_data:
logger.info("Found stored refresh token - using it")
# Create a simple auth client for the RefreshTokenAuthorizer
auth_client = NativeAppAuthClient(ZSTASH_CLIENT_ID)
transfer_authorizer = RefreshTokenAuthorizer(
refresh_token=token_data["refresh_token"], auth_client=auth_client
)
transfer_client = TransferClient(authorizer=transfer_authorizer)
return transfer_client
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves (3) Start interactive zstash test transfer & (4) Start a second interactive zstash transfer, and from #339, as we no longer need to run a toy problem first. The only issue is that the very first run must be authenticated into. There's really no way of getting around that.

Comment on lines +260 to +270
if err.info.consent_required:
logger.error("Consent required - this suggests scope issues.")
logger.error(
"With proper scope handling, this block should not be reached."
)
logger.error(
"Please report this bug at https://github.com/E3SM-Project/zstash/issues, with details of what you were trying to do."
)
raise RuntimeError(
"Insufficient Globus consents - please report this bug"
) from err
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a check in this block, but it's a little unclear if/how it would be reached now, so I figured it would be best to just leave a debugging note here.

@forsyth2 forsyth2 marked this pull request as ready for review August 6, 2025 00:58
@forsyth2 forsyth2 requested a review from TonyB9000 August 6, 2025 00:58
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 6, 2025

I still need to check how the existing unit tests run after all these changes

On Chrysalis:

python -m unittest tests/test_*.py

gives:

======================================================================
ERROR: test_globus (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_globus
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/miniforge3/envs/zstash-simple-globus-script-newer-globus-20250805/lib/python3.9/unittest/loader.py", line 154, in loadTestsFromName
    module = __import__(module_name)
  File "/gpfs/fs1/home/ac.forsyth2/ez/zstash/tests/test_globus.py", line 8, in <module>
    from fair_research_login.client import NativeClient
ModuleNotFoundError: No module named 'fair_research_login'


----------------------------------------------------------------------
Ran 71 tests in 50.747s

FAILED (errors=1, skipped=32)

The 32 skipped tests are the HPSS-required versions of each of the tests, so those need to be checked on Perlmutter. The only test to actually fail is the Globus "unit" test, and that's because that test is set up in a rather cumbersome way, that essentially repeats work of globus.py... Perhaps we can consider deleting that test or converting it to one of the bash integration test scripts.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Aug 14, 2025

Latest commits are rebased off main (that is, they include the #378 merge)

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Sep 4, 2025

@TonyB9000 Regarding your latest email:

But I don’t quite understand why authentication for “create”, “update”, “check” and “extract” (and for that matter, “ls”) should be different (where Globus Endpoints are involved).

I was thinking the globus activation is usually handled on create/update but yes the authentication should indeed hold for all. I was just simulating the workflow from #339.

when I use “create”, I assume I must exercise “zstash create” on a local set of files, to a remote location.  Or do I?

I suppose you could globus transfer it to a different location on the same machine; the main point is that Globus is being used.

I am still unclear which token are under threat of expiration:

  • Authentication to Globus
  • Authentication to Source Endpoint
  • Authentication to Destination Endpoint

That, I'm not sure of. Potentially any of the three, I would assume.

If you could give me an explicit pattern to exercise, I can make it as “long running” as you like. 

Below, an example manual test on extremely small directory. If you could test something like this on something long-running enough we might expect a token expiration ("Note that this transfer is limited to 48 hours due to Globus token expiration." according to #339), I think that would be a sufficient test if "(6) Using Globus web interface, manually transfer zstash files that were not transferred due to token expiration." from #339 has been rendered obsolete.

cd ~/ez/zstash
git status
# Make sure there are no uncommitted changes
git checkout issue-339-globus
git log
# Top 2 commits:
# Fix auths and add test
# Better handle Globus authentications
lcrc_conda # My function to set up conda locally
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-issue-339-380-20250903
conda activate zstash-issue-339-380-20250903
pre-commit run --all-files
python -m pip install .

# To start fresh with Globus:
# 1. Log into endpoints (LCRC Improv DTN, NERSC Perlmutter) at globus.org: File Manager > Add the endpoints in the "Collection" fields
# 2. To start fresh, with no consents: https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"

mkdir /lcrc/group/e3sm/ac.forsyth2/zstash_testing/issue_339_380_v1
cd /lcrc/group/e3sm/ac.forsyth2/zstash_testing/issue_339_380_v1
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
# Use UUID for NERSC Perlmutter 
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1 zstash_demo
# globus_sdk.services.auth.errors.AuthAPIError: ('POST', 'https://auth.globus.org/v2/oauth2/token', None, 400, 'Error', '{"error":"invalid_grant"}')
ls /home/ac.forsyth2/.globus-native-apps.cfg
# ls: cannot access '/home/ac.forsyth2/.globus-native-apps.cfg': No such file or directory
ls /home/ac.forsyth2/.zstash_globus_tokens.json
# /home/ac.forsyth2/.zstash_globus_tokens.json
rm /home/ac.forsyth2/.zstash_globus_tokens.json
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1 zstash_demo
# One authentication prompt: go to URL, authenticate to LCRC & NERSC, paste code
mkdir ../issue_339_380_v2
cd ../issue_339_380_v2
zstash check --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1
# No authentication prompts
# Success
mkdir ../issue_339_380_v3
cd ../issue_339_380_v3

# What if we didn't have the authentications set up?
ls /home/ac.forsyth2/.zstash_globus_tokens.json
# /home/ac.forsyth2/.zstash_globus_tokens.json
rm /home/ac.forsyth2/.zstash_globus_tokens.json
zstash check --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1
# One authentication prompt: go to URL, paste code
# Success

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test tests/scripts/globus_auth.bash confirms we only have to do one authentication now. I've also updated that script to replace the Globus unit test, which was becoming tech debt for two reasons -- 1) it essentially duplicated the activate function and other internal workings, meaning the actual code for those parts wasn't getting tested, and 2) it relied on an older version of the Client.

This PR addresses steps 2-4 of #339. That is, users will no longer have to handle ~/.globus-native-apps.cfg and run a "toy" problem first. Step 6 (the early token expiration) remains an open problem, however, and will need to be addressed in a future pull request.

@forsyth2 forsyth2 merged commit bc444ce into main Sep 16, 2025
3 checks passed
@forsyth2 forsyth2 deleted the issue-339-globus branch September 16, 2025 18:12
@TonyB9000
Copy link
Collaborator

@forsyth2 Will you issue a release soon? I keep building new environments (now with zstash 1.4.4) and would like at least to test "zstash check" in a stand-alone configuration (fetching a NERSC archive). It still serves to do the archive extraction flawlessly.

@forsyth2
Copy link
Collaborator Author

Will you issue a release soon?

That will be release candidate rc1 once we begin the Unified testing period at the beginning of October, and then eventually an actual release. In the meantime, you can use zstash from a dev environment rather than from Unified.

@forsyth2 forsyth2 mentioned this pull request Oct 18, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Globus Globus semver: bug Bug fix (will increment patch version) semver: small improvement Small improvement (will increment patch version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants