Add S3 write support with hybrid storage for BP5#4831
Open
eisenhauer wants to merge 14 commits intoornladios:masterfrom
Open
Add S3 write support with hybrid storage for BP5#4831eisenhauer wants to merge 14 commits intoornladios:masterfrom
eisenhauer wants to merge 14 commits intoornladios:masterfrom
Conversation
945e589 to
faced1c
Compare
pnorbert
previously approved these changes
Feb 5, 2026
| else | ||
| { | ||
| // Use same transport for data as metadata | ||
| m_DataTransportsParameters = m_IO.m_TransportsParameters; |
Contributor
There was a problem hiding this comment.
So the metadata is always stored locally?
Member
Author
There was a problem hiding this comment.
In this implementation, yes. No reason why we couldn't have a pure S3 version, though some things probably get a little messier. For example, if you were actively writing, the metadata files would have to go above 5Mb before we could write even a portion of them. No issue if we have a normal shutdown, but the possibility of crash recovery would be virtually eliminated. But easy to have a copy-to-S3 and/or copy-from-s3 utility, things like that.
Implement write support for FileAWSSDK transport using S3 multipart upload API. This enables writing data to S3-compatible storage. Key features: - Multipart upload with configurable part sizes (min_part_size, max_part_size) - Zero-copy uploads using AWS SDK's PreallocatedStreamBuf - Buffering for small writes to meet S3's 5MB minimum part size - Direct upload path for large writes to avoid unnecessary copies - Proper cleanup of multipart uploads on close Parameters: - min_part_size: Minimum part size for uploads (default 5MB, S3 minimum) - max_part_size: Maximum part size for uploads (default 5GB, S3 maximum) Unit tests included (disabled by default, require S3-compatible endpoint): - Basic write/read roundtrip - Large file multipart upload (15MB) - Many small writes with buffer accumulation - Mixed size writes - Boundary condition writes - Very small writes (1KB chunks) - Configurable part size Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- BP5Reader/Writer: separate FilePool for metadata vs data files - DataTransport parameter routes data files to alternate transport - FileAWSSDK: fix path handling with bucket parameter - ParseArgs: support quoted values for URLs with colons - Fix transport cleanup order in DoClose to avoid mutex issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix S3_MIN_PART_SIZE: use 5 MiB (5*1024*1024) not 5 MB (5*1000*1000) S3 requires non-final parts to be at least 5 MiB, causing EntityTooSmall errors with the incorrect decimal value - Fix S3_MAX_PART_SIZE: use 5 GiB (5*1024*1024*1024) for consistency Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This reverts commit 770e884.
8ec9200 to
7c2ca3d
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…reading campaigns where data is on s3. It does not work yet for files in tar files.
modify awsdk cache setup and campaign reader s3 parameters to enable …
pnorbert
approved these changes
Feb 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
S3 Multipart Upload
min_part_size,max_part_size)Hybrid Storage (Local Metadata + S3 Data)
DataTransportengine parameter to route data files to S3S3Endpoint,S3BucketOther Fixes
Usage