-
Notifications
You must be signed in to change notification settings - Fork 10
Better handle Globus authentications #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 I've isolated code specific to Globus auths from the refactor PR (#370) to here. (I still think the refactor is important for code clarity & future development, but it is perhaps most useful to pull out this bug fix into a separate PR)
My most recent run, using the code on this branch
cd ~/ez/zstash
lcrc_conda # My function to activate conda locally
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-339-20250730
conda activate zstash-339-20250730
pre-commit run --all-files
python -m pip install .
# Try 1: Using whatever state my auths happened to be in
cd ../
mkdir zstash_test339_20250730_try1
cd zstash_test339_20250730_try1
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
# >>> Log into NERSC, paste auth code once
# Worked!
# Try 2: Completely clean slate: no globus cfg, no globus consents
rm ~/.globus-native-apps.cfg
cd ../
mkdir zstash_test339_20250730_try2
cd zstash_test339_20250730_try2
# globus.org > File Manager > select "LCRC Improv DTN", "NERSC Perlmutter"
# https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg does not exist. zstash will need to prompt for authentications twice, and then you will need to re-run.
# >>> Log into Argonne, NERSC login prompt (but it remembers credentials), paste auth code
# Worked!
# Try 3: Try again with the new auth state
cd ../
mkdir zstash_test339_20250730_try3
cd zstash_test339_20250730_try3
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
# >>> No prompt to log into Argonne/NERSC, paste auth code once
# Worked!
# Try 4: Try again from the same directory, as if try 3 was the "toy problem"
cd zstash_test339_20250730_try3
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_20250730 zstash_demo
# INFO: /home/ac.forsyth2/.globus-native-apps.cfg exists. If this file does not have the proper settings, it may cause a TransferAPIError (e.g., 'Token is not active', 'No credentials supplied')
# >>> No prompt to log into Argonne/NERSC, paste auth code once
# Worked! But still needed the auth code!!Referring to the list of Globus problems on #339:
- Login to Globus web interface and activate end points -- This was always needed.
- Delete existing globus cfg file -- it really doesn't seem to matter if we delete this or not, the current code will always ask for exactly one auth code. It does look like it's almost never in the correct configuration, so maybe we can just always auto-delete it before each try.
- Start interactive zstash test transfer -- Now skippable
- Start a second interactive zstash transfer -- Now skippable
- Start long transfer -- Got better: we only ever have to paste the auth code once now and we never have to do a "toy problem" first before running what we really want to run. Got worse: we appear to always need an auth code (this will pose a problem for situations where we run a toy problem before running a script that calls
zstash: the script would now hang waiting for an auth code). (Basically, now instead of entering an auth code 2 times or 0 times, it's always 1 time). - Using Globus web interface, manually transfer zstash files that were not transferred due to token expiration. -- I don't think this is resolved...
- Repeat steps (2) to (4) above. Restart zstash archiving that stopped
Claude suggests:
|
|
I'd want to ask Claude, in the context of process automation involving somewhat arbitrary demands for invoking new globus transfers in a long-running control process, does this Globus security feature (requiring explicit user consent for accessing specific endpoints in each session) make such automation impossible? Alternately, is it feasible to set up an automated (synthetic human) interaction to facilitate automation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 Ok, I think I've finally got a working version! This actually allows users to:
- Authenticate on the first new run (no toy problem needed)
- Do future runs without any further authentications.
This can be replace a number of PRs/commits, and resolve a number of open issues:
- This PR moves the changes from the prototype script of #382 into production code (i.e., the actual
zstashcode base). - This PR replaces the commits 9edaf6f and 747feb2 in #370. Again, I still want to merge the refactor to clean up the code base, but I think it makes sense to pull out these auth changes into their own PR so we can test with more modularity.
- See inline comments for more issue/PR resolutions.
I still need to check how the existing unit tests run after all these changes, but my test of the problematic functionality has been successful.
| client.oauth2_start_flow( | ||
| requested_scopes=all_scopes, | ||
| refresh_tokens=True, # This is the key to persistent auth! | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addresses #338 -- the oauth2_start_flow comes from https://globus-sdk-python.readthedocs.io/en/stable/examples/native_app.html#with-refresh-tokens.
Relatedly, I believe this is the Globus Flow that @rljacob mentioned earlier today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this oauth2_start_flow may also resolve the (5) Start long transfer [...] Note that this transfer is limited to 48 hours due to Globus token expiration. and (6) Using Globus web interface, manually transfer zstash files that were not transferred due to token expiration. of #339.
(Previously, that is, on main, we were using native_client.login(no_local_server=True, refresh_tokens=True), so my hope is that the oauth2_start_flow is nicer to use).
But @TonyB9000 I think I'd need you to test on a long run to be sure that this is in fact the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2 This is great. Hopefully (using the new CMIP6 production workflow) we only need to fetch an archive "as-needed", as opposed to fetching them all at once - so each fetch should last less than a day - I hope.
(I still need to understand how the last fetch (wlin's NERSC directory) failed. I'll try it again soon.)
| from typing import Dict, List, Optional | ||
|
|
||
| from globus_sdk import ( | ||
| NativeAppAuthClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we're using the Globus client instead of the fair_research_login one, which means 1) it will be easier to ask the Globus support team for help in the future, and 2) we can close #253 as irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is AWESOME. Excellent work Ryan.
| check_log_has() | ||
| { | ||
| local expected_grep="${1}" | ||
| local log_file="${2}" | ||
| grep "${expected_grep}" ${log_file} | ||
| if [ $? != 0 ]; then | ||
| echo "Expected grep '${expected_grep}' not found in ${log_file}. Test failed." | ||
| exit 2 | ||
| fi | ||
| } | ||
|
|
||
| check_log_does_not_have() | ||
| { | ||
| local not_expected_grep="${1}" | ||
| local log_file="${2}" | ||
| grep "${not_expected_grep}" ${log_file} | ||
| if [ $? == 0 ]; then | ||
| echo "Not-expected grep '${expected_grep}' was found in ${log_file}. Test failed." | ||
| exit 2 | ||
| fi | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out a similar testing setup for #378. I think these bash tests with grep checks are a better testing method for zstash than the existing "unit" tests (which in reality are closer to integration tests, as they do quite a few system calls with tests/base.run_cmd...)
| if os.path.exists(GLOBUS_CFG): | ||
| logger.warning( | ||
| f"Globus CFG {GLOBUS_CFG} exists. This may be left over from earlier versions of zstash, and may cause issues. Consider deleting." | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need to worry about GLOBUS_CFG, resolving (2) Delete existing globus cfg file from #339
| if "refresh_token" in token_data: | ||
| logger.info("Found stored refresh token - using it") | ||
| # Create a simple auth client for the RefreshTokenAuthorizer | ||
| auth_client = NativeAppAuthClient(ZSTASH_CLIENT_ID) | ||
| transfer_authorizer = RefreshTokenAuthorizer( | ||
| refresh_token=token_data["refresh_token"], auth_client=auth_client | ||
| ) | ||
| transfer_client = TransferClient(authorizer=transfer_authorizer) | ||
| return transfer_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolves (3) Start interactive zstash test transfer & (4) Start a second interactive zstash transfer, and from #339, as we no longer need to run a toy problem first. The only issue is that the very first run must be authenticated into. There's really no way of getting around that.
| if err.info.consent_required: | ||
| logger.error("Consent required - this suggests scope issues.") | ||
| logger.error( | ||
| "With proper scope handling, this block should not be reached." | ||
| ) | ||
| logger.error( | ||
| "Please report this bug at https://github.com/E3SM-Project/zstash/issues, with details of what you were trying to do." | ||
| ) | ||
| raise RuntimeError( | ||
| "Insufficient Globus consents - please report this bug" | ||
| ) from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a check in this block, but it's a little unclear if/how it would be reached now, so I figured it would be best to just leave a debugging note here.
On Chrysalis: python -m unittest tests/test_*.pygives: The 32 skipped tests are the HPSS-required versions of each of the tests, so those need to be checked on Perlmutter. The only test to actually fail is the Globus "unit" test, and that's because that test is set up in a rather cumbersome way, that essentially repeats work of |
620c622 to
33a7990
Compare
|
Latest commits are rebased off |
|
@TonyB9000 Regarding your latest email:
I was thinking the globus activation is usually handled on create/update but yes the authentication should indeed hold for all. I was just simulating the workflow from #339.
I suppose you could globus transfer it to a different location on the same machine; the main point is that Globus is being used.
That, I'm not sure of. Potentially any of the three, I would assume.
Below, an example manual test on extremely small directory. If you could test something like this on something long-running enough we might expect a token expiration ("Note that this transfer is limited to 48 hours due to Globus token expiration." according to #339), I think that would be a sufficient test if "(6) Using Globus web interface, manually transfer zstash files that were not transferred due to token expiration." from #339 has been rendered obsolete. cd ~/ez/zstash
git status
# Make sure there are no uncommitted changes
git checkout issue-339-globus
git log
# Top 2 commits:
# Fix auths and add test
# Better handle Globus authentications
lcrc_conda # My function to set up conda locally
rm -rf build
conda clean --all --y
conda env create -f conda/dev.yml -n zstash-issue-339-380-20250903
conda activate zstash-issue-339-380-20250903
pre-commit run --all-files
python -m pip install .
# To start fresh with Globus:
# 1. Log into endpoints (LCRC Improv DTN, NERSC Perlmutter) at globus.org: File Manager > Add the endpoints in the "Collection" fields
# 2. To start fresh, with no consents: https://auth.globus.org/v2/web/consents > Manage Your Consents > Globus Endpoint Performance Monitoring > rescind all"
mkdir /lcrc/group/e3sm/ac.forsyth2/zstash_testing/issue_339_380_v1
cd /lcrc/group/e3sm/ac.forsyth2/zstash_testing/issue_339_380_v1
mkdir zstash_demo; echo 'file0 stuff' > zstash_demo/file0.txt
# Use UUID for NERSC Perlmutter
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1 zstash_demo
# globus_sdk.services.auth.errors.AuthAPIError: ('POST', 'https://auth.globus.org/v2/oauth2/token', None, 400, 'Error', '{"error":"invalid_grant"}')
ls /home/ac.forsyth2/.globus-native-apps.cfg
# ls: cannot access '/home/ac.forsyth2/.globus-native-apps.cfg': No such file or directory
ls /home/ac.forsyth2/.zstash_globus_tokens.json
# /home/ac.forsyth2/.zstash_globus_tokens.json
rm /home/ac.forsyth2/.zstash_globus_tokens.json
zstash create --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1 zstash_demo
# One authentication prompt: go to URL, authenticate to LCRC & NERSC, paste code
mkdir ../issue_339_380_v2
cd ../issue_339_380_v2
zstash check --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1
# No authentication prompts
# Success
mkdir ../issue_339_380_v3
cd ../issue_339_380_v3
# What if we didn't have the authentications set up?
ls /home/ac.forsyth2/.zstash_globus_tokens.json
# /home/ac.forsyth2/.zstash_globus_tokens.json
rm /home/ac.forsyth2/.zstash_globus_tokens.json
zstash check --hpss=globus://6bdc7956-fc0f-4ad2-989c-7aa5ee643a79/global/homes/f/forsyth/zstash/tests/manual_run_issue339_380_v1
# One authentication prompt: go to URL, paste code
# Success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test tests/scripts/globus_auth.bash confirms we only have to do one authentication now. I've also updated that script to replace the Globus unit test, which was becoming tech debt for two reasons -- 1) it essentially duplicated the activate function and other internal workings, meaning the actual code for those parts wasn't getting tested, and 2) it relied on an older version of the Client.
This PR addresses steps 2-4 of #339. That is, users will no longer have to handle ~/.globus-native-apps.cfg and run a "toy" problem first. Step 6 (the early token expiration) remains an open problem, however, and will need to be addressed in a future pull request.
|
@forsyth2 Will you issue a release soon? I keep building new environments (now with zstash 1.4.4) and would like at least to test "zstash check" in a stand-alone configuration (fetching a NERSC archive). It still serves to do the archive extraction flawlessly. |
That will be release candidate |
Summary
Objectives:
Issue resolution:
Select one: This pull request is...
Big Change
1. Does this do what we want it to do?
Required:
If applicable:
2. Are the implementation details accurate & efficient?
Required:
If applicable:
zstash/conda, not just animportstatement.3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: