HDDS-13919. S3 Conditional Writes (PutObject) [2/2] - Reuse Atomic Rewrite at Commit Path#10023
Conversation
| throw new OMException("ETag mismatch", | ||
| OMException.ResultCodes.ETAG_MISMATCH); | ||
| } | ||
| } |
There was a problem hiding this comment.
it seems like the same logic ad in CommitRequest
There was a problem hiding this comment.
correct, and that's actually unnecessary as the reason I wrote in the description.
- Extract Generation: If ETag matches, extract existingKey.updateID.
- Create Open Key: Create open key entry with expectedDataGeneration = existingKey.updateID.
There was a problem hiding this comment.
Ok after thinking a bit more, I think using expectedDataGeneration ensure that only one PutObject can succeed even if two PutObject are using the same valid If-Match.
From the if-match semantic, if there are two concurrent clients that are calling PutObject with the same if-match, both should succeed since the ETag is the same although expectedDataGeneration are not the same.
So in the end, our implementation of if-match is more strict (not necessarily a bad thing). However, it might cause duplicated requests to be rejected. Let me know what you think.
There was a problem hiding this comment.
From the if-match semantic, if there are two concurrent clients that are calling PutObject with the same if-match, both should succeed since the ETag is the same although expectedDataGeneration are not the same.
This would only be true if the ETag of the object didn't change upon a successful (Conditional) PutObject.
There was a problem hiding this comment.
Yes, overall I'm OK with the current approach.
There was a problem hiding this comment.
However, it might cause duplicated requests to be rejected
let me think about which applications built on top of the AWS S3 spec would be affected by this stricter semantics.
There was a problem hiding this comment.
@ivandika3 The only thing I can think of is that the AWS behavior can suffer from the ABA problem if duplicate conditional PutObject requests are not rejected. In contrast, our version behaves more like a tagged pointer, which prevents the ABA scenario. So everything else stays the same; only concurrent duplicate or ABA-style requests gain a stronger safety guarantee.
There was a problem hiding this comment.
Good point on the ABA problem although it might be fine since the latest object is still back to the original content (i.e. A).
My current worry if a single client sends a PutObject, but retries it again (e.g. due to some network restriction, etc), the retry might fail. We can try to look into when checking our AWS S3 spec compatibility. Saw your https://github.com/peterxcli/ozone-s3-compatibility, great initiative!
There was a problem hiding this comment.
My current worry is that if a single client sends a PutObject but retries it again (e.g., due to some network restriction, etc.), the retry might fail. We can try to look into this when checking our AWS S3 spec compatibility.
Thanks @ivandika3, this prompted me to review the conditional put requests in the AWS spec, which mentions that both If-Match and If-None-Match requests can throw two types of exceptions: PreconditionFailed (412) and ConditionalRequestConflict (409). Applying this to our case, PreconditionFailed should be thrown if the validation is violated when running createKeyRequest, whereas ConditionalRequestConflict should be thrown during commitKeyRequest.
I've raised a Jira ticket for this. BTW, the PR is also ready:
- https://issues.apache.org/jira/browse/HDDS-14968
- HDDS-14968. Concurrent S3 Conditional PUT Commit Conflict Detection #10043
My current worry is that if a single client sends a PutObject but retries it again (e.g., due to some network restriction, etc.), the retry might fail.
I think upon a client retry, the execution goes through the createKey -> commitKey flow again. This stores the latest updateID of the key in the openKeyTable as expectedDataGeneration. Then, when it comes to the atomic rewrite key commit, the stored expectedDataGeneration matches the current updateID of the key in the keyTable.
So in conclusion, concurrent duplicate conditional requests might result in one of them failing, but if the duplicate requests are dependent on each other (i.e., causal), then it won't be a problem.
There was a problem hiding this comment.
Saw your [peterxcli/ozone-s3-compatibility](https://github.com/peterxcli/ozone-s3-compatibility?rgh-link-date=2026-04-03T12%3A56%3A30Z), great initiative!
Thanks @ivandika3 for recognizing the work!
I've noticed that Ceph's s3-tests might not be entirely accurate for true S3 compatibility, as some nuances still exist. It seems they mark certain tests with a fails_on_aws flag, which might be because they implemented the AWS S3 spec based on their own interpretation. What's worse is that they didn't do this consistently for all cases.
Take an example related to conditional requests: they send an If-None-Match request where the ETag is not an asterisk, whereas the AWS S3 spec explicitly states: "Expects the * character (asterisk)."
So, an additional review of that S3 compatibility suite is required; the actual compatibility numbers might not be as bad as we seem.
ivandika3
left a comment
There was a problem hiding this comment.
@peterxcli Thanks for the patch.
| throw new OMException("ETag mismatch", | ||
| OMException.ResultCodes.ETAG_MISMATCH); | ||
| } | ||
| } |
There was a problem hiding this comment.
Ok after thinking a bit more, I think using expectedDataGeneration ensure that only one PutObject can succeed even if two PutObject are using the same valid If-Match.
From the if-match semantic, if there are two concurrent clients that are calling PutObject with the same if-match, both should succeed since the ETag is the same although expectedDataGeneration are not the same.
So in the end, our implementation of if-match is more strict (not necessarily a bad thing). However, it might cause duplicated requests to be rejected. Let me know what you think.
|
Thanks @ivandika3 and @yandrey321 for the review! |
What changes were proposed in this pull request?
OM Create Phase
Validation is performed within the validateAndUpdateCache method to ensure atomicity within the Ratis state machine application.
Sorry I forgot to follow the step 4 and 5 in my previous PR #9815 ...
What is the link to the Apache JIRA
issues.apache.org/jira/browse/HDDS-13919
How was this patch tested?
existing tests