-
Notifications
You must be signed in to change notification settings - Fork 356
Support wasb://
and wasbs://
#1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There is also an open issue on the Regarding fsspec/adlfs#493, is the protocol identical? |
I am not sure but have been testing it with Azurite localy and it works as expected. I am going to try use it on the cloud. |
@christophediprima Thanks for testing that, appreciate it. We also test against |
We have been testing it on Azure Blob Storage with my team and we had no issues. What kind of tests can you think about? |
Looks like we have a few adls integration tests against the azurite docker iceberg-python/tests/io/test_fsspec.py Line 298 in b86d7d5
perhaps we can extend these to include wasb and wasbs |
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Starting from version 20, PyArrow supports ADLS filesystem. This PR adds Pyarrow Azure support to Pyiceberg. PyArrow is the [default IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369) for Pyiceberg catalogs. In Azure environment it handles wider spectrum of auth strategies then Fsspec, including, for instance, [Managed Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview). Also, prior to this PR #1663 (that is not merged yet) there was no support for wasb(s) with Fsspec. See the corresponding issue for more details: #2112 # Are these changes tested? Tests are added under tests/io/test_pyarrow.py. # Are there any user-facing changes? There are no API breaking changes. Direct impact of the PR: Pyarrow FileIO in Pyiceberg supports Azure cloud environment. Examples of impact for final users: - Pyiceberg is usable in services with Managed Identities auth strategy. - Pyiceberg is usable with wasb(s) schemes in Azure. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Starting from version 20, PyArrow supports ADLS filesystem. This PR adds Pyarrow Azure support to Pyiceberg. PyArrow is the [default IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369) for Pyiceberg catalogs. In Azure environment it handles wider spectrum of auth strategies then Fsspec, including, for instance, [Managed Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview). Also, prior to this PR apache#1663 (that is not merged yet) there was no support for wasb(s) with Fsspec. See the corresponding issue for more details: apache#2112 # Are these changes tested? Tests are added under tests/io/test_pyarrow.py. # Are there any user-facing changes? There are no API breaking changes. Direct impact of the PR: Pyarrow FileIO in Pyiceberg supports Azure cloud environment. Examples of impact for final users: - Pyiceberg is usable in services with Managed Identities auth strategy. - Pyiceberg is usable with wasb(s) schemes in Azure. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
depends on fsspec/adlfs#493 |
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Starting from version 20, PyArrow supports ADLS filesystem. This PR adds Pyarrow Azure support to Pyiceberg. PyArrow is the [default IO](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/__init__.py#L366-L369) for Pyiceberg catalogs. In Azure environment it handles wider spectrum of auth strategies then Fsspec, including, for instance, [Managed Identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview). Also, prior to this PR apache#1663 (that is not merged yet) there was no support for wasb(s) with Fsspec. See the corresponding issue for more details: apache#2112 # Are these changes tested? Tests are added under tests/io/test_pyarrow.py. # Are there any user-facing changes? There are no API breaking changes. Direct impact of the PR: Pyarrow FileIO in Pyiceberg supports Azure cloud environment. Examples of impact for final users: - Pyiceberg is usable in services with Managed Identities auth strategy. - Pyiceberg is usable with wasb(s) schemes in Azure. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
i have a local change that parameterizes all the adls integration tests with abfs, abfss, wasb, and wasbs its currently failing with, note the wrong path
|
pushed the parameterized test here for reference. i changed all reference of the protocol for adls to use the |
added the monkey patch solution here for reference. we can also wait for fsspec/adlfs#493 to land |
fsspec/adlfs#512 added the ability to override protocol
but for older versions of adlfs, we would still need to monkey patch |
Closes #2271 and #1606
This will work as soon as this is merged: fsspec/adlfs#493