-
Notifications
You must be signed in to change notification settings - Fork 735
Determine Azure directories more accurately using metadata and blob properties #6428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Determine Azure directories more accurately using metadata and blob properties #6428
Conversation
…roperties - Refactor AzFileAttributes to use the `hdi_isfolder` metadata key as the primary indicator for directories. - Fallback to blob name ending with '/' only if metadata is not present. - Add and update tests to cover edge cases for directory detection. - Improves compatibility with Azure Data Lake Storage Gen2 and hierarchical namespace scenarios. Signed-off-by: adamrtalbot <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
- Improved logic to determine if a blob is a directory by checking for the "hdi_isfolder" metadata key and its value. - If the blob is a directory, set size to 0 and mark as directory. - If not, treat as file and set size and timestamps from blob properties. - This ensures compatibility with Azure's explicit directory markers and avoids misclassification. Signed-off-by: adamrtalbot <[email protected]>
Signed-off-by: adamrtalbot <[email protected]>
…adata Signed-off-by: adamrtalbot <[email protected]>
|
||
AzFileAttributes(String containerName, BlobItem item) { | ||
objectId = "/${containerName}/${item.name}" | ||
directory = item.name.endsWith('/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman what will break if we incorrectly classify something as a file when it's a pseudo-directory? What's the impact of getting it wrong?
…tory bug (#6427) Add a testingest to document and demonstrate the bug in AzFileAttributes where any blobName ending with '/' is treated as a directory, regardless of actual Azure semantics. This test will help verify and prevent regressions when fixing the directory detection logic. Signed-off-by: adamrtalbot <[email protected]>
…directory - Only treat paths ending with a trailing slash as directories (e.g. `/container/dir/`). - Paths without a trailing slash (even if they exist as directories in Azure) are treated as files unless Azure metadata explicitly marks them as folders. - This fixes inconsistent behavior in Nextflow pipelines where `file("az://container/dir")` and `file("az://container/dir/")` were detected differently. - Updates `AzFileAttributes` logic and adds/updates tests to ensure correct directory detection for all path formats. Signed-off-by: adamrtalbot <[email protected]>
} | ||
} | ||
|
||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Azure API Calls Cause Performance Issues
The isDirectory()
method now makes multiple uncached Azure API calls (exists()
, getProperties()
, listBlobs()
) on every invocation when the directory
field is false. This causes a significant performance regression. It also has a logic flaw that can lead to incorrect directory detection and lacks exception handling for these network calls.
} | ||
} | ||
|
||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Trailing Slash Bug Affects Directory Detection
The AzPath
constructor's handling of trailing slashes causes isDirectory()
to return early, bypassing its new Azure-querying logic for directory detection. This also creates unreachable code in the AzFileAttributes
constructor and leads to conflicting test expectations regarding paths ending with a slash.
Additional Locations (4)
plugins/nf-azure/src/test/nextflow/cloud/azure/nio/AzPathTest.groovy#L43-L44
plugins/nf-azure/src/test/nextflow/cloud/azure/nio/AzPathTest.groovy#L65-L66
plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzFileAttributes.groovy#L136-L137
plugins/nf-azure/src/test/nextflow/cloud/azure/nio/AzPathTest.groovy#L161-L185
plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzFileAttributes.groovy
Show resolved
Hide resolved
Enhance the logic for determining if a blob represents a directory in Azure. Now considers both the `hdi_isfolder` metadata and the presence of child blobs, providing more accurate directory status detection. Updates related tests to reflect the improved behavior. Signed-off-by: adamrtalbot <[email protected]>
I have a branch that breaks out these functions into a separate AzDirectoryUtils class for DRY...too much complexity? |
hdi_isfolder
metadata key as the primary indicator for directories.hdi_isfolder
metadata is not present or blob does not exist, it will list blobs at path. If there are "sub blobs", it will determine it is a directory.Fixes #6427
Signed-off-by: adamrtalbot [email protected]
Hi! Thanks for contributing to Nextflow.
When submitting a Pull Request, please sign-off the DCO [1] to certify that you are the author of the contribution and you adhere to Nextflow's open source license [2] by adding a
Signed-off-by
line to the contribution commit message. See [3] for more details.Note
Cursor Bugbot is generating a summary for commit 4dfec66. Configure here.