Skip to content

perf: parallel range-GET S3 downloads for large objects#225

Merged
joshfriend merged 1 commit intomainfrom
jfriend/parallel-s3-range-get
Mar 26, 2026
Merged

perf: parallel range-GET S3 downloads for large objects#225
joshfriend merged 1 commit intomainfrom
jfriend/parallel-s3-range-get

Conversation

@joshfriend
Copy link
Copy Markdown
Contributor

@joshfriend joshfriend commented Mar 24, 2026

Replace the single GetObject stream in S3.Open with parallel range-GET requests for objects larger than 32 MiB. 8 workers download chunks concurrently and reassemble them in order via io.Pipe, multiplying S3 throughput for cold snapshot downloads (observed ~100 MB/s single-stream → 400+ MB/s with parallel connections on staging hardware).

@joshfriend joshfriend requested a review from a team as a code owner March 24, 2026 20:47
@joshfriend joshfriend requested review from worstell and removed request for a team March 24, 2026 20:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afbf08ad55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/cache/s3_parallel_get.go Outdated
Comment thread internal/cache/s3_parallel_get.go Outdated
@joshfriend joshfriend force-pushed the jfriend/parallel-s3-range-get branch 3 times, most recently from 7ba1a28 to 3294383 Compare March 24, 2026 21:00
Comment thread internal/cache/s3_parallel_get.go Outdated
Comment thread internal/cache/s3_parallel_get.go Outdated
Comment thread internal/cache/s3_parallel_get.go Outdated
Comment thread internal/cache/s3_parallel_get.go Outdated
@joshfriend joshfriend force-pushed the jfriend/parallel-s3-range-get branch 3 times, most recently from 0bd3818 to 26dd9c5 Compare March 25, 2026 21:00
Replace the single GetObject stream in S3.Open with parallel range-GET
requests for objects larger than 32 MiB. Workers download chunks
concurrently via errgroup and reassemble them in order via io.Pipe.

All chunk requests are pinned to the ETag from the initial stat to
prevent corruption if the key is overwritten mid-read. An errgroup
with derived context ensures all workers are cancelled promptly on
any error or early consumer close.
@joshfriend joshfriend force-pushed the jfriend/parallel-s3-range-get branch from 26dd9c5 to 8662246 Compare March 25, 2026 21:02
@joshfriend joshfriend merged commit 07b6ee3 into main Mar 26, 2026
7 checks passed
@joshfriend joshfriend deleted the jfriend/parallel-s3-range-get branch March 26, 2026 16:57
@joshfriend
Copy link
Copy Markdown
Contributor Author

Adding this parallel request strategy to minio directly: minio/minio-go#2216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants