Skip to content

Conversation

leif-scality
Copy link
Contributor

@leif-scality leif-scality commented Sep 12, 2025

CLDSRV-744: add missing md5 checksum checks

validateChecksumsNoChunking returns a MissingChecksum error because some S3 methods (e.g. DeleteObjects) may require a checksum.

The DeleteObjects only mentions that it requires an MD5 but in practice it requires either MD5 or a checksum using one of the new algorithms, the newer SDK's send CRC64NVME only. The requirement will be added after the new algorithms are implemented.

Support for the newer algorithms (CRC-64/NVME, CRC-32, CRC-32C, SHA-1, SHA-256) will be added inside validateChecksumsNoChunking in another PR.

@bert-e
Copy link
Contributor

bert-e commented Sep 12, 2025

Hello leif-scality,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Sep 12, 2025

Incorrect fix version

The Fix Version/s in issue CLDSRV-744 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.0.27

  • 9.1.1

Please check the Fix Version/s of CLDSRV-744, or the target
branch of this pull request.

Copy link

codecov bot commented Sep 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.22%. Comparing base (0dfee33) to head (befeb4e).
⚠️ Report is 7 commits behind head on development/9.0.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/Config.js 79.42% <100.00%> (+0.19%) ⬆️
lib/api/api.js 89.69% <100.00%> (+0.27%) ⬆️
lib/api/apiUtils/integrity/validateChecksums.js 100.00% <100.00%> (ø)
lib/api/bucketPutCors.js 77.55% <ø> (-2.81%) ⬇️
lib/api/bucketPutReplication.js 91.89% <ø> (ø)
lib/api/multiObjectDelete.js 79.84% <ø> (-0.45%) ⬇️
lib/api/objectPutTagging.js 100.00% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.0    #5941      +/-   ##
===================================================
+ Coverage            83.18%   83.22%   +0.03%     
===================================================
  Files                  188      189       +1     
  Lines                12105    12128      +23     
===================================================
+ Hits                 10070    10093      +23     
  Misses                2035     2035              
Flag Coverage Δ
ceph-backend-test 65.74% <86.48%> (+0.06%) ⬆️
file-ft-tests 66.13% <97.29%> (+0.08%) ⬆️
kmip-ft-tests 26.95% <56.75%> (+0.09%) ⬆️
mongo-v0-ft-tests 67.97% <97.29%> (+0.05%) ⬆️
mongo-v1-ft-tests 67.97% <97.29%> (+0.07%) ⬆️
multiple-backend 35.50% <83.78%> (+0.16%) ⬆️
quota-tests 32.22% <83.78%> (-0.75%) ⬇️
quota-tests-inflights 34.20% <83.78%> (+0.11%) ⬆️
unit 67.37% <100.00%> (+0.08%) ⬆️
utapi-v2-tests 33.36% <83.78%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

*/
function validateChecksumsNoChunking(headers, body) {
if (headers['content-md5']) {
const md5 = crypto.createHash('md5').update(body, 'utf8').digest('base64');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/nodejs/node/blob/main/doc/changelogs/CHANGELOG_V20.md#crypto-implement-cryptohash

https://nodejs.org/docs/latest-v24.x/api/crypto.html#cryptohashalgorithm-data-options

The .hash function is still marked as experimental but as RC, could be interesting to see if we better perf for small body

A utility for creating one-shot hash digests of data. It can be faster than the object-based crypto.createHash() when hashing a smaller amount of data (<= 5MB) that's readily available. If the data can be big or if it is streamed, it's still recommended to use crypto.createHash() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use .hash but given that the impacted endpoints are configuration, low req/sec methods I don't think we will see any difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PutObjectRetention is today (at least) called thousands of times per second with some solutions like Veeam. Even if this reduces later, this will remain high. I've seen platforms with 6,000 op/s on a 3 server architecture. The CPU impact could be significant, I suggest we properly test it during RC phase.

Copy link
Contributor Author

@leif-scality leif-scality Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we properly test it during RC phase

What is the procedure to track this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CPU cost is to elevated the only alternative I think is to disable it, but AWS does the check and looking at the body of the request it is fairly small so it should be ok

https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectRetention.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a variable in the config to disable the check in PutObjectRetention if needed

function bucketPutACL(authInfo, request, log, callback) {
log.debug('processing request', { method: 'bucketPutACL' });

const err = validateChecksumsNoChunking(request.headers, request.post);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf: https://scality.atlassian.net/wiki/spaces/OS/pages/3317235725/Checking+object+integrity+in+Amazon+S3?focusedCommentId=3389947938

Is it the right place the validate checksum or should we rather perform the other validations first before doing CPU intensive validation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it down, but I'd rather keep on top like in the other methods. I don't see how calculating the checksums could be exploited by an attacker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another compelling reason to do the checksum earlier: we could avoid transcoding the string back to a Buffer just for the purpose of checksumming. I don't know for sure but I suspect the cost in CPU and memory of converting to a Buffer is not negligible, even compared to the cost of checksumming itself.

I am thinking we could precompute the MD5 checksum at the same time we aggregate the buffers, i.e. around

post.push(chunk);
. But it would have to adapt to the hash algorithm(s) selected, I don't know for sure if we have that info at this point.

Another benefit could be that we could do generic validation against the headers for all request types, to reduce code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we transcoding the String back to a Buffer? The crypto hash.update takes a string and does no transcoding.

Not all endpoints handle the checksum in the same way so I think we should handle them inside each endpoint and not in api.js

@leif-scality leif-scality force-pushed the feature/CLDSRV-744-check-content-md5-2 branch from 8094468 to 840de91 Compare September 16, 2025 13:50
Comment on lines 4 to 5
MD5Mismatch: 'md5 mismatch',
MissingChecksum: 'missing checksum',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MD5Mismatch: 'md5 mismatch',
MissingChecksum: 'missing checksum',
MD5Mismatch: 'MD5Mismatch',
MissingChecksum: 'MissingChecksum',

I'd prefer the same string for key and value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


const err = validateChecksumsNoChunking(request.headers, request.post);
if (err && err.error !== ChecksumError.MissingChecksum) {
log.debug(err, { error: errors.BadDigest });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the debug can be simplified into

Suggested change
log.debug(err, { error: errors.BadDigest });
log.debug(err.error, err.details);

as the BadDigest will show in the response already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the BadDigest get logged after we exit the handler? If not how do will we know that we sent a BadRequest to the client?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but there might be some debug logs to print the error sent to client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before my changes we logged the BadDigest so I rather keep it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the BadDigest

Comment on lines 478 to 481
const err = validateChecksumsNoChunking(request.headers, request.post);
if (err && err.error !== ChecksumError.MissingChecksum) {
log.debug(err, { error: errors.BadDigest });
return callback(errors.BadDigest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always have the same pattern, it would be better to just return the arsenal error in the function, or nothing, and if there is an error, then we call the callback.
The debug log can be moved there too. Should probably be an error one as it makes the whole API fail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, according to the doc:

Directory bucket - The Content-MD5 request header or a additional checksum request header (including x-amz-checksum-crc32, x-amz-checksum-crc32c, x-amz-checksum-sha1, or x-amz-checksum-sha256) is required for all Multi-Object Delete requests.

In the general case I think it's fine to ignore the MissingChecksum error, but here, I believe we want to enforce it, so return an error if the header is not set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have

    it('should accept request when content-md5 header is missing', done => {

in the test, so I guess that was expected, but that looks not standard. Given your implementation it means there is just no need to return the MissingChecksum error, we can just return nothing, if it's fine to not validate anything if the header is not present. That should simplify the processing in the APIs...

Copy link
Contributor Author

@leif-scality leif-scality Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot require the checksum header yet because we don't support the other algorithms today, so this endpoint would break with the newer SDK's if we require it, after we implement the newer algorithms we can return an error on MissingChecksum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a ticket number as TODO so we do not forget anything when this will be done later? As this conflicts with that is in the PR message, that will help reviewers understand the context till this is fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the conflict with the PR message?

I created this ticket CLDSRV-747 to enable the requirement when we have the other checksums

*/
function validateChecksumsNoChunking(headers, body) {
if (headers['content-md5']) {
const md5 = crypto.createHash('md5').update(body, 'utf8').digest('base64');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PutObjectRetention is today (at least) called thousands of times per second with some solutions like Veeam. Even if this reduces later, this will remain high. I've seen platforms with 6,000 op/s on a 3 server architecture. The CPU impact could be significant, I suggest we properly test it during RC phase.

Comment on lines +18 to +21
return { error: ChecksumError.MD5Mismatch, details: { calculated: md5, expected: headers['content-md5'] } };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently do nothing of this object, when we handle errors. Should we just return the error and add a log in this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the caller should do the logging.

In the next iteration I will add SHA256Mismatch, CRC32Mismatch, etc. I don't see why we would want to squash all the possible errors into BadDigest

function bucketPutACL(authInfo, request, log, callback) {
log.debug('processing request', { method: 'bucketPutACL' });

const err = validateChecksumsNoChunking(request.headers, request.post);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another compelling reason to do the checksum earlier: we could avoid transcoding the string back to a Buffer just for the purpose of checksumming. I don't know for sure but I suspect the cost in CPU and memory of converting to a Buffer is not negligible, even compared to the cost of checksumming itself.

I am thinking we could precompute the MD5 checksum at the same time we aggregate the buffers, i.e. around

post.push(chunk);
. But it would have to adapt to the hash algorithm(s) selected, I don't know for sure if we have that info at this point.

Another benefit could be that we could do generic validation against the headers for all request types, to reduce code duplication.

Comment on lines 30 to 31
if (config.integrityChecks.PutObjectRetention)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (config.integrityChecks.PutObjectRetention)
{
if (config.integrityChecks.PutObjectRetention) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we transcoding the String back to a Buffer? The crypto hash.update takes a string and does no transcoding.

Not all endpoints handle the checksum in the same way so I think we should handle them inside each endpoint and not in api.js

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crypto hash.update takes a string and does no transcoding.

Are you sure of this? I assumed that it was always necessary to convert to a Buffer internally since hashes operate on bytes, not characters (however optimizations are likely possible in theory if the encoding used for hashing is the same as the native storage format of JS strings, but I'm not sure this is used in practice and internal formats may not be utf-8 anyways, e.g. Python uses 2 bytes per character). I also asked Gemini, it seems to confirm my assumption, here's the first part of its answer:

When passing data to the Node.js crypto module's hash.update() function, using a raw Buffer is generally faster and more performant than using a string. 🚀 This performance difference stems from how Node.js handles these two data types.

Key Performance Differences
String Encoding Overhead: When you pass a string to hash.update(), Node.js must first convert that string into a sequence of bytes, or a Buffer. This conversion process requires additional CPU cycles and memory allocation to handle character encodings (like UTF-8, which is the default). For every string you pass, this encoding operation adds a small, but cumulatively significant, overhead.

Direct Binary Data: A Buffer, on the other hand, is already a direct representation of raw binary data. It's essentially a block of pre-allocated memory outside the V8 JavaScript engine's heap. This means when you pass a Buffer to hash.update(), the function can directly access the underlying bytes without any conversion step. It's already in the format the cryptographic hash function needs, leading to more efficient processing.
[...]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all endpoints handle the checksum in the same way so I think we should handle them inside each endpoint and not in api.js

Do you know what are the differences in practice? I don't know about those, but maybe we could detect in some way the exceptions in the generic place? Just trying to see if we can do better, if it's indeed a pain to do it generically, no problem for me with doing it per call (but the performance point above may still apply).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit that moves the check to api.js before the body gets transformed into a string.

Today the only non standard endpoint is DeleteObjects, it is the only method that requires a checksum (the requirement wont be added in this PR because we are missing the other algos).

After searching NodeJS does indeed decode into a new buffer here, I thought it did it on the fly.

@leif-scality leif-scality force-pushed the feature/CLDSRV-744-check-content-md5-2 branch 2 times, most recently from 97ceccc to 9bdef3e Compare September 18, 2025 15:05
Copy link
Contributor

@jonathan-gramain jonathan-gramain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the POC LGTM, I'll review the final PR when completed, thanks for doing the change!

lib/Config.js Outdated
};

if (config.integrityChecks) {
if ('PutObjectRetention' in config.integrityChecks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that it could be even simpler to configure a map of integrity checks to disable for all possible calls, and provide this possibility for other calls than putObjectRetention in case we need it. We would check the presence of a custom configuration with config.integrityChecks[apiCall] instead in the generic handler, and enable the checks by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1,5 +1,4 @@
const async = require('async');

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you cleanup the added/removed lines that were left over after the latest refactor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

});

api.callApiMethod(method, requestWithBody, response, log, err => {
assert(!err, `Unexpected error for ${method} with good checksum: ${err}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(!err, `Unexpected error for ${method} with good checksum: ${err}`);
assert.ifError(err, `Unexpected error for ${method} with good checksum: ${err}`);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@leif-scality leif-scality force-pushed the feature/CLDSRV-744-check-content-md5-2 branch 2 times, most recently from 775732a to fb7f9d5 Compare September 22, 2025 15:42
@leif-scality
Copy link
Contributor Author

ping

@leif-scality leif-scality force-pushed the feature/CLDSRV-744-check-content-md5-2 branch from fb7f9d5 to c63ab6d Compare September 23, 2025 16:11
@leif-scality
Copy link
Contributor Author

ping

@bert-e
Copy link
Contributor

bert-e commented Sep 24, 2025

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@leif-scality
Copy link
Contributor Author

/create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Sep 24, 2025

Conflict

A conflict has been raised during the creation of
integration branch w/9.1/feature/CLDSRV-744-check-content-md5-2 with contents from feature/CLDSRV-744-check-content-md5-2
and development/9.1.

I have not created the integration branch.

Here are the steps to resolve this conflict:

 git fetch
 git checkout -B w/9.1/feature/CLDSRV-744-check-content-md5-2 origin/development/9.1
 git merge origin/feature/CLDSRV-744-check-content-md5-2
 # <intense conflict resolution>
 git commit
 git push -u origin w/9.1/feature/CLDSRV-744-check-content-md5-2

The following options are set: create_integration_branches

@leif-scality
Copy link
Contributor Author

/create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Sep 24, 2025

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: create_integration_branches

@leif-scality
Copy link
Contributor Author

/approve

@leif-scality
Copy link
Contributor Author

ping

@bert-e
Copy link
Contributor

bert-e commented Sep 24, 2025

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/9.0

  • ✔️ development/9.1

The following branches will NOT be impacted:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve, create_integration_branches

@bert-e
Copy link
Contributor

bert-e commented Sep 24, 2025

Queue build failed

The corresponding build for the queue failed:

  • Checkout the status page.
  • Identify the failing build and review the logs.
  • If no issue is found, re-run the build.
  • If an issue is identified, checkout the steps below to remove
    the pull request from the queue for further analysis and maybe rebase/merge.
Remove the pull request from the queue
  • Add a /wait comment on this pull request.
  • Click on login on the status page.
  • Go into the manage page.
  • Find the option called Rebuild the queue and click on it.
    Bert-E will loop again on all pull requests to put the valid ones
    in the queue again, while skipping the one with the /wait comment.
  • Wait for the new queue to merge, then merge/rebase your pull request
    with the latest changes to then work on a proper fix.
  • Once the issue is fixed, delete the /wait comment and
    follow the usual process to merge the pull request.

@bert-e
Copy link
Contributor

bert-e commented Sep 24, 2025

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/9.0

  • ✔️ development/9.1

The following branches have NOT changed:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8

Please check the status of the associated issue CLDSRV-744.

Goodbye leif-scality.

@bert-e bert-e merged commit 1d1c6c1 into development/9.0 Sep 24, 2025
74 of 76 checks passed
@bert-e bert-e deleted the feature/CLDSRV-744-check-content-md5-2 branch September 24, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants