Skip to content

Conversation

carlopi
Copy link
Collaborator

@carlopi carlopi commented Sep 19, 2025

This is the same logic of #106, but to me more clear and a crafted test that should show the change in behaviour.

@samansmink, thanks for the feedback, but you'll have to review also this one, sorry for the noise.


We currently follow always same path on uploading to S3-like storage:

  • 1 POST to start upload
  • 1 or more PUT requests to upload buffers
  • 1 POST to finalize upload

While this is general and battle tested, there is an improvement, we could check the count of buffers to be uploaded, and if that is at 1 we could perform a single PUT (moving from 3 requests over the network to a single one).

This is particularly significant for writes to data lakes in general and Iceberg in particular, due to the fact that both JSON and AVRO files have to be uploaded (and that means a INSERT INTO <table> VALUES () costs currently 10+ sequential network requests)

Next up: figuring out how to cut the HEAD request currently performed when opening a FileHandle, that could properly move to single request.

For the reviewer(s): I am assuming we always fully upload a file, and never just a single part of an existing file. It would be much better to check that explicitly, unsure if there is else needed.

@carlopi carlopi changed the title One request upload Single PUT when uploading single buffer-files to S3 Sep 19, 2025
@carlopi carlopi changed the title Single PUT when uploading single buffer-files to S3 Single PUT when uploading single buffer-files to S3 (version 2) Sep 19, 2025
Copy link
Contributor

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@carlopi
Copy link
Collaborator Author

carlopi commented Sep 22, 2025

I would also for merging this AND bumping httpfs in duckdb/duckdb to have also this, but we need to complete a conversation with Sam.

Basically: should this be available by default in 1.4.1 or when? Depending on the answer, then there are different strategies to roll this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants