Default uploads to curl with pinned headers#747
Merged
stevevanhooser merged 3 commits intomainfrom Apr 18, 2026
Merged
Conversation
Pre-signed PUT uploads going through MATLAB's native HTTP client sometimes store objects in S3 with headers that don't match what curl sends, producing inconsistent Content-Encoding/Content-Type metadata on the objects and flaky downloads later. Flip the default to curl for consistency with ndi.cloud.api.files.getFile.
Align the implementation class with the wrapper's new curl-by-default behavior. Explicitly set Content-Type: application/octet-stream and Accept-Encoding: identity on the PUT so the metadata stored on the S3 object is consistent across clients. Add -f so stale signed URLs (403) or missing objects (404) fail loudly instead of being reported as a successful 200 OK upload.
The non-bulk branch already forwarded options.useCurl to putFiles, but the bulk-upload branch called putFiles with no options, so it always took the (previously false) default. With the default now true this is cosmetic, but making the forward explicit keeps the caller's choice honored if useCurl is ever set to false here.
Contributor
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #747 +/- ##
==========================================
- Coverage 32.24% 32.17% -0.07%
==========================================
Files 671 671
Lines 29463 29466 +3
==========================================
- Hits 9500 9481 -19
- Misses 19963 19985 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
stevewds
approved these changes
Apr 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ndi.cloud.api.files.putFiles(and the underlyingPutFilesimplementation class) touseCurl = true, matching the curl-by-default we already landed on downloads.Content-Type: application/octet-streamandAccept-Encoding: identityso the object metadata stored by S3 is identical no matter which client issued the PUT. Add-fso stale signed URLs (403/404) fail loudly instead of being reported as a successful upload.options.useCurlfromuploadSingleFile's bulk-upload branch (previously only the non-bulk branch forwarded it, so bulk uploads silently used the MATLAB HTTP path).Why
Audit of the upload paths showed several places still using MATLAB's native HTTP stack (
zipForUpload,uploadDocumentCollectionbatch,uploadSingleFilebulk branch, anything callingputFileswithoutuseCurl). Different clients tag S3 objects with different headers, which is the most plausible explanation for the flaky "Invalid TAR file" errors on freshly-uploaded.tgzartifacts — object metadata varies depending on which code path performed the upload. Standardizing on curl with pinned headers makes the stored metadata deterministic and matches the download side.Test plan
ndi.unittest.cloud.readIngestedon Linux and Mac.ndi.symmetry.makeArtifacts.dataset.downloadIngested/testDownloadIngestedArtifactson Linux.aws s3api head-objecton a freshly-uploaded.tgzto confirmContentType: application/octet-streamand noContentEncodingmetadata.https://claude.ai/code/session_01HMnM1qnDBgGdjSqSnjTfsV