-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19256 Integrate PutIfNotExist functionality into S3A createFile() #7011
base: trunk
Are you sure you want to change the base?
Conversation
|
||
if (optionHeaders != null && optionHeaders.containsKey("If-None-Match")) { | ||
requestBuilder = CompleteMultipartUploadRequest.builder().bucket(bucket).key(destKey).uploadId(uploadId) | ||
.overrideConfiguration(override ->override.putHeader("If-None-Match", optionHeaders.get("If-None-Match"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not something that SDK can provide to us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK has it: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/PutObjectRequest.html#ifNoneMatch()
It exists since the version 2.27.9, we are using 2.27.12 in pom.xml, so we should have this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also being discussed on the Jira.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should update the SDK in a separate PR, and then do this in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using SDK provided ifNoneMatch() and other implementations.
💔 -1 overall
This message was automatically generated. |
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
|
||
if (optionHeaders != null && optionHeaders.containsKey("If-None-Match")) { | ||
requestBuilder = CompleteMultipartUploadRequest.builder().bucket(bucket).key(destKey).uploadId(uploadId) | ||
.overrideConfiguration(override ->override.putHeader("If-None-Match", optionHeaders.get("If-None-Match"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK has it: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/PutObjectRequest.html#ifNoneMatch()
It exists since the version 2.27.9, we are using 2.27.12 in pom.xml, so we should have this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Overall looks good, I've left some comments. I do think we should have a separate PR that upgrades the SDK first. See #6900 as SDK upgrade example.
You should also add some documentation about this in index.md so people know how to enable it, the exception that will be thrown so calling applications can handle it.
And let's also wait for @steveloughran's review.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
Map<String, String> optionHeaders = builder.putOptions.getHeaders(); | ||
|
||
if (optionHeaders != null && optionHeaders.containsKey(IF_NONE_MATCH_HEADER)) { | ||
maybeModifiedPutIfAbsentRequest.overrideConfiguration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned below as well, think we should upgrade SDK and then use the new .ifNoneMatch().
Also I would recommend you move all of this logic into a new private method in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I would move this logic to RequestFactoryImpl. But you will need to add in a flag to the method signature, "isConditonalPutEnabled", and only add the if-none-match header in when sConditonalPutEnabled is true.
Basically, you only want to add if-none-match
when the PUT is coming from S3ABlockOutputStream.close() and not from anywhere else.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java
Outdated
Show resolved
Hide resolved
As per the discussion on the Jira, we also need the ability to override the configured values of these params, using s3afs implementation so that clients can use same s3afs instance and use conditional writes/reads as required rather than only allow config to set default behaviour for the given s3afs instance. We might need to implement rename() that can take such params. We might either need more follow up work (in that case, each of this can be sub-task) or have the primary PR cover all cases? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I am really excited about this! As far as applications software is concerned –this is the most significant change since S3 went consistent. It allows people to implement algorithms relying on atomic-and-exclusive create() or rename() to guarantee they were able to exclusively write or rename a file to destination. That is not something have with the S3 committers.
I would like to see this being enabled by default for creating files with overwrite=false and for renaming files.
For file creation, this will give create(path, overwrite=false) a lot of the true posix semantics, and allow us to strip HEAD request off every such operation.
Rename will not quite be the same as people may expect because directory rename is still non-atomic and vulnerable to parallel directory rename to the same tree. People shouldn't do that. At least with S3 storage renaming by copying is so slow that people tend to complain about the performance enough that they might suspect something is wrong.
Regarding this PR then:
- HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt #6938 MUST come first. That is a fix for a V2 SDK regression rather than a feature, and it will be stepping on some of the same classes (S3ABlockOutputStream). Please review that so that it can be merged in ASAP.
- SDK update can go in early, not just for this but in the hope they have fixed some other outstanding issues.
I think we should be more ambitious here. What is done here is great in showing what is possible and the initial test, but I would like to make the change more fundamental.
We need some option (I don't know what the name should be) to indicate the header to be used when writing files with overwrite=false. It will be true by default. We can have separate options (or use this one) for rename() and for committing files. Having a single unified auction is simpler unless we can possibly imagine a case where somebody would want to turn it off yet have it on for createFile
- innerCreateFile() to always pass down the overwrite flag if set; this will then be processed in S3ABlockOutputStream.close()
- error translation to map the relevant status code to IOEs. 412 currently translate to RemoteFileChangedException; copyFile will already raise it if the etag of the source object changes during the copy. Hopefully that IOE is a valid one to raise in this situation. We will have to check -and if not come up with some strategy to address it.
- documentation including concept and stack traces. Third party page to add this as another option to disable.
- path capabilities probes.
That should be it in terms of behavioural changes. Testing will be fun. The test need to include
- deletion of target during a multipart upload (just write to a __magic path)
- conflict between multipart uploads
- multipart / single part conflict
- multi rename race (how?)
- job committer conflict
It will be interesting to see how the third-party store handle this. I do not want to make it something which has to be turned on because this will be such a profound change for AWS S3 that I don't want to make it something people have to set -or even worse, Forget to set yet assume the behaviour is present
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
+test this on google gcs through the connector and see what failure exception it raises; have it translate to the same ioe |
@diljotgrewal upgrading an aws SDK MUST be done as a standalone action. we have a whole section of the testing doc that covers this. please create the jira and follow the steps; always a good experience to actually do this on the command line, look for unexpected log messages, performance regressions etc. It's been fairly low pain recently but the v2 sdk still has some sharp edges. |
Sorry, I didnt intend to push those changes here. I've removed those commits. I did start a separate PR for the sdk update yesterday under the same JIRA. I'll create a new JIRA for the sdk update and update the PR. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
FYI I'm working on a document covering how we can modify the output stream, rename code and more to deliver this more broadly. Will share soon |
+1, I too think the scope can be increased. Can we add this to fs.create() when overwrite is false if not thought about it? We could save HEAD s3 calls. |
@shameersss1 exactly. which means that fs.create(path, false).close() is equivalent to other filesystems, though
is not quite the same as the failure is on line 3, not line 1. But the overall behaviour holds: if the write completes then you created the file and nobody has overwritten it. file rename would be equivalent; dir rename not though. Been distracted by this. something about "latest SDKs keep telling us off for using the SDK" which is complicating my life and makes we really reluctant to pick up a new SDK unless we are sure it fixes some longstanding issues. I don't see why everyone is working on features when significant bugs exist and every upgrade causes regressions in behaviour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlockOutputStream.hasCapability() to return true if the stream is going to do check. This should be a new constant which will be defined such that it will be the same when we do create(overwrite=false) and in FS.hasPathCapability() that's probe will be needed to allow clients to check for the feature being available before they ask for it. This is to make sure they can check what semantics are being offered before even trying to open the file.
needs an extra test to validate multipart upload behaviour if file is there or a delete happens.
This could be done with writing to magic path -easy to write but the test needs to loading the .pending file and commit it, which is fiddly.
How about we add another createFile() option to pass to S3ABlockOutputStream
to trigger a multipast upload, always, just as magic commiters do -a normal close() will complete the upload as it does with large files.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
} catch (Exception e) { | ||
Assert.assertEquals(RemoteFileChangedException.class, e.getClass()); | ||
|
||
S3Exception s3Exception = (S3Exception) e.getCause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was gong to point you at GenericTestUtils.assertExceptionContains()
but it doesn't validate the cause type or return the value as a cast type the way intercept()
does. Time for a new assert there.
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CreateFileBuilder.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CreateFileBuilder.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at this again (while I look at the followon work)
- Add a new flag
conditionalCreate
to org.apache.hadoop.fs.s3a.impl.CreateFileBuilder.CreateFileOptions - CreateFileBuilder.build() to set the flag if the option fs.s3a.conditional.file.create is true
- Add a similar flag to S3ABlockOutputStream.BlockOutputStreamBuilder
S3AFS.innerCreateFile() will be modified to set the BlockOutputStreamBuilder
flag if it is set in CreateFileOptions
Note: it would be good to be able to strip out the HEAD check there -but there are risks. For further investigation (and perhaps a new switch to disable)
@diljotgrewal If you can address the outstanding review comments -don't worry about the multipart testing for now. we can merge this into a new branch I'm planning to create for condition write operations so we can stabilise it over a few patches.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Outdated
Show resolved
Hide resolved
191c914
to
a4658b2
Compare
@steveloughran |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@diljotgrewal except in the special case of "massive merge conflict requiring a rebase", can you just use merge commits once we are in the review phase. Github lets me review changes happened since my last review -but it cannot do this with forced pushes. This makes my life harder and reduces the frequency of me actually looking at patches. I will review later |
Sorry about the force push. I just did one last force push to undo my previous force push and split the changes back into the original commit and added a separate commit where I addressed the feedback. I'll stick to merge commits from here on out. Hopefully splitting them back will help with the review. Thanks! |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a minor review.
I am about to create a feature branch for this which will have the latest updated SDK so that you will have the new header stuff; this PR can go in on on top of that and we can do a couple more iterations and merge into trunk
- once we are happy with the code
- once are confident the SDK is not going to create more problems.
You will be able to validate the later SDK while we stick to the current one
|
||
fs.mkdirs(testFile.getParent()); | ||
// enough bytes for Multipart Upload | ||
byte[] fileBytes = dataset(6 * _1MB, 'a', 'z' - 'a'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are ways to do this with small files, this matters because a 6MB file is large enough that it'd have to become a scale test. Yes, it is a scale test when people are testing across the planet.
Do not worry about this right now, but be aware before merging into trunk the large file test will have to be moved to a subset of S3AScaleTestBase, and this test replaced with something using a write to a magic path –or even better, we add another new create file option to force multipart uploads always.
Designing for that makes me think that a followup to this should move to an enumset of CreateFile flags
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestCreateFileBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did another quick review.
Can you resubmit this against the branch HADOOP-19256-s3-conditional-writes , which is where this work shall initially go
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/ITestS3APutIfMatch.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java
Outdated
Show resolved
Hide resolved
@steveloughran just pushed a new set of changes. Would you prefer a new PR with |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
I've created a new branch for you, into which all work can be committed. But first go through the comments on this one just so we can see that those existing comments have been addresses, then, without adding any new comments on this PR, we can commit to the feature branch and do all ongoing work as incremental changes on that branch itself
You can do one for the SDK now, just to start things off |
🎊 +1 overall
This message was automatically generated. |
Description of PR
This PR adds support for S3 Conditional Write (Put-if-absent) capability which is now generally available - https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
How was this patch tested?
We added integration tests within this PR which tests both mpu path and regular put path, and confirms that we see a conflict if we try to mutate a already existing key.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?