Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Blobclient.uploadFromFile overuse of I/O resources #44570

Open
boje-stibo opened this issue Mar 10, 2025 · 2 comments
Open

Prevent Blobclient.uploadFromFile overuse of I/O resources #44570

boje-stibo opened this issue Mar 10, 2025 · 2 comments
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@boje-stibo
Copy link

Query/Question
We have an in-house developed Java application utilizing the Azure SDK for uploading backups of our Cassandra database.

We recently removed a rate-limiter from our application, as it lead to thread leakage (TrackingID#2412170050002232).
As a result, we are now concerned about the usage of resource, in particular Disk I/O and Network Bandwidth.

We are concerned that uploading backups could lead to performance issues for the Cassandra Database (located on the same servers as the application).

Is there a way to limit the resources used by the Azure SDK?

For additional context, we are using Blobclient.uploadFromFile(<file_path>, overwrite:true), and are not overriding any TransferOptions.

Files very in size from a few GB upwards of 4 TB.

Why is this not a Bug or a feature Request?
The upload is functional. We simply need advice on resouce management.

Setup (please complete the following information if applicable):

  • OS: CentOS Linux
  • IDE: IntelliJ
  • Library/Libraries: com.azure:azure-storage-blob:12.25.4

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • [ X] Query Added
  • [ X] Setup information Added
@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 10, 2025
@joshfree
Copy link
Member

Hi @boje-stibo, thanks for reaching out. As you already know, you can rate limit on the client side with a popular RateLimiter library:

// With Guava
BlobClient blobClient = // your initialized BlobClient
RateLimitedBlobUploader uploader = new RateLimitedBlobUploader(blobClient, 5.0); // 5 uploads per second
uploader.uploadFile("/path/to/file.txt");

// With resilience4j
Resilience4jBlobUploader uploader = new Resilience4jBlobUploader(
    blobClient, 
    10,                          // 10 uploads per period
    Duration.ofMinutes(1)        // per minute
);
uploader.uploadFile("/path/to/file.txt");

Someone from the Storage team will follow up shortly with other suggestions. Please note that if this is urgent, we recommend using an Azure support ticket rather than GitHub issues (see SUPPORT.md in the root of the repo)

@joshfree joshfree added Storage Storage Service (Queues, Blobs, Files) and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Mar 10, 2025
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Mar 10, 2025
@joshfree joshfree changed the title Prevent overuse of resources Prevent Blobclient.uploadFromFile overuse of I/O resources Mar 10, 2025
@alzimmermsft
Copy link
Member

Hi @boje-stibo, another solution you loosely alluded to is configuring ParallelTransferOptions with reduced ParallelTransferOptions.setMaxConcurrency(Integer) and call BlobClient.uploadFromFile(String, ParallelTransferOptions, BlobHttpHeaders, Map<String, String>, AccessTier, BlobRequestConditions, Duration) (https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobClient.java#L509) or BlobClient.uploadFromFileWithResponse(BlobUploadFromFileOptions, Duration, Context) (https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobClient.java#L562).

By default, ParallelTransferOptions.maxConcurrency is set to 8, meaning 8 parallel operations could occur at the same time. Reducing this parallelization factor will limit the number of threads performing I/O and network operations at the same time, reducing resource usage and contention with your Cassandra Database.

Here's an example:

// Setting concurrency factor to 1, everything else can be null as that emulates the same behavior as 
// BlobClient.uploadFromFile("filePath", true)
blobClient.uploadFromFile("filePath", new ParallelTransferOptions.setMaxConcurrency(1), null, null, null, null, null);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

3 participants