refactor(storage): refactor MultiRangeDownloader #13524

tritone · 2025-12-30T06:45:43Z

Complete rewrite of storage.MultiRangeDownloader. The new design should be more resilient to concurrency issues, deadlocks, retries, etc.

gemini-code-assist · 2025-12-30T06:46:28Z

Summary of Changes

Hello @tritone, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a substantial architectural overhaul to the MultiRangeDownloader component within the storage client. The changes aim to create a more robust and maintainable system by moving from a distributed goroutine-based implementation to a centralized, event-loop-driven manager. This new design is intended to make the multi-range download functionality more resilient to transient network issues, concurrency challenges, and potential deadlocks, ultimately leading to a more reliable and performant client.

Highlights

Core Refactor of MultiRangeDownloader: The storage.MultiRangeDownloader has been completely rewritten to improve resilience against concurrency issues, deadlocks, and retries. The new design centralizes state management and uses an event-loop architecture.
New Manager-Based Architecture: A new multiRangeDownloaderManager struct and associated command pattern (mrdAddCmd, mrdCloseCmd, etc.) have been introduced to orchestrate the download process, handling requests, responses, and stream lifecycle events.
Enhanced Stream Session Management: Individual gRPC bidirectional streams are now managed by bidiReadStreamSession instances, which include robust mechanisms for initial session establishment, sending requests, receiving responses, and handling stream redirections and retries.
Improved Error Handling and Retries: The refactor includes more sophisticated error handling, particularly for gRPC status codes and redirect errors, with automatic stream re-establishment and re-sending of pending ranges. Test cases were updated to reflect these new error behaviors.
Dependency Updates: Several Go module dependencies, including various AWS SDK components, gRPC, Prometheus, and OpenTelemetry, have been updated in go.work.sum.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request presents a significant and well-structured refactoring of the MultiRangeDownloader. The new design, which utilizes an event loop (actor model), is an excellent approach for managing concurrency and state, resulting in more robust and comprehensible logic. The clear separation of concerns among the manager, sessions, and commands is commendable.

However, I have identified a critical issue where a blocking io.Writer could stall the entire downloader. Additionally, there are a couple of medium-severity issues concerning API design and code duplication that, if addressed, would further enhance the resilience and maintainability of this new implementation.

gemini-code-assist · 2025-12-30T06:48:25Z

storage/grpc_reader_multi_range.go

+
+		content := dataRange.GetChecksummedData().GetContent()
+		req.bytesWritten += int64(len(content))
+		_, err := req.output.Write(content)


The req.output.Write(content) call is executed within the manager's single-threaded event loop. If a user provides an io.Writer that blocks (e.g., writing to a slow network connection or a full pipe), it will stall the entire event loop. This prevents the processing of any other events, such as receiving more data from the gRPC stream, handling new Add requests, or closing the downloader. This can lead to performance degradation, deadlocks, and may cause the stream's receive buffer to fill up, creating backpressure on the server.

To resolve this, the Write operation should be performed asynchronously, outside of the event loop. A potential solution is to use a dedicated writer goroutine for each range request that consumes data from a channel. The event loop would then push data chunks to the appropriate channel without blocking.

Discussed this offline, it is the caller's responsibility to ensure a non-blocking writer. We can consider this as an option later.

Even a non-blocking writer will take some time to write though - if this is blocking Add() calls in the meantime it may be noticeable performance degradation for certain workloads.

Definitely worth investigating but if we are just writing messages to in-memory buffers it will be trivial. For file io I agree it will be a bigger deal perhaps.

storage/reader.go

gemini-code-assist · 2025-12-30T06:48:25Z

storage/grpc_reader_multi_range.go

+func (m *multiRangeDownloaderManager) handleStreamEnd(result mrdSessionResult) {
+	m.currentSession = nil
+	err := result.err
+
+	if result.redirect != nil {
+		m.readSpec.RoutingToken = result.redirect.RoutingToken
+		m.readSpec.ReadHandle = result.redirect.ReadHandle
+		if ensureErr := m.ensureSession(m.ctx); ensureErr != nil {
+			if !m.isRetryable(ensureErr) {
+				m.permanentErr = ensureErr
+				m.attrsOnce.Do(func() { close(m.attrsReady) })
+				m.failAllPending(m.permanentErr)
+			}
+		}
+	} else if m.isRetryable(err) {
+		if len(m.pendingRanges) > 0 {
+			if ensureErr := m.ensureSession(m.ctx); ensureErr != nil {
+				if !m.isRetryable(ensureErr) {
+					m.permanentErr = ensureErr
+					m.attrsOnce.Do(func() { close(m.attrsReady) })
+					m.failAllPending(m.permanentErr)
+				}
+			}
+		}
+	} else {
+		if !errors.Is(err, context.Canceled) && !errors.Is(err, errClosed) {
+			if m.permanentErr == nil {
+				m.permanentErr = err
+			}
+		} else if m.permanentErr == nil {
+			m.permanentErr = errClosed
+		}
+		m.failAllPending(m.permanentErr)
+		m.attrsOnce.Do(func() { close(m.attrsReady) })
+	}
+}


The logic for handling a non-retryable error from ensureSession is duplicated within both the if result.redirect != nil block and the else if m.isRetryable(err) block. This repetition makes the code more difficult to read and maintain.

This complex error-handling logic could be simplified by refactoring the duplicated code into a common path that is executed after the if/else if chain. This would improve code clarity and reduce the chance of introducing bugs in the future.

func (m *multiRangeDownloaderManager) handleStreamEnd(result mrdSessionResult) { m.currentSession = nil err := result.err var ensureErr error if result.redirect != nil { m.readSpec.RoutingToken = result.redirect.RoutingToken m.readSpec.ReadHandle = result.redirect.ReadHandle ensureErr = m.ensureSession(m.ctx) } else if m.isRetryable(err) { if len(m.pendingRanges) > 0 { ensureErr = m.ensureSession(m.ctx) } } else { // Non-retryable error from the stream itself. if !errors.Is(err, context.Canceled) && !errors.Is(err, errClosed) { if m.permanentErr == nil { m.permanentErr = err } } else if m.permanentErr == nil { m.permanentErr = errClosed } m.failAllPending(m.permanentErr) m.attrsOnce.Do(func() { close(m.attrsReady) }) return } // Handle error from ensureSession. if ensureErr != nil && !m.isRetryable(ensureErr) { m.permanentErr = ensureErr m.attrsOnce.Do(func() { close(m.attrsReady) }) m.failAllPending(m.permanentErr) } }

I agree. I think there is a bit of code duplication here. Can be simplified a bit I think.

Complete rewrite of storage.MultiRangeDownloader. The new design should be more resilient to concurrency issues, deadlocks, retries, etc.

BrennaEpp

Some initial comments/questions

storage/grpc_reader_multi_range.go

storage/reader.go

storage/grpc_reader_multi_range.go

storage/reader.go

storage/grpc_reader_multi_range.go

krishnamd-jkp · 2026-01-06T10:43:20Z

storage/grpc_reader_multi_range.go

+	}
+	m.readIDCounter++
+
+	// Attributes should be ready if we are processing Add commands


We should also check a case where if offset is greater than the object size, the range should be failed and not added to the stream. Otherwise, a permanent error will be set on the MRD when server gives a Out of Range error.

This is a bit of a tricky case because it is possible if there is another concurrent writer to the object, the Size will be out-of-date and these calls will in fact succeed. I don't think we validate this in the existing MRD code and it's on the caller to decide how to handle this.

Yes but the MRD works with the single version of the object which is what we should stick to ? Without this validation we would set permanent error if there is any one invalid range provided. (not getting data for any valid range provided after this)

The MRD is supposed to support an object that grows; see the tailing reads example: https://github.com/GoogleCloudPlatform/golang-samples/blob/main/storage/rapid/read_appendable_object_tail.go

Can you check with the GCSFuse team on the expected behavior here? I know they have logic in their code to recover from these types of permanent errors.

Sounds good, will confirm with them, if required can be fixed in a subsequent PR

storage/grpc_reader_multi_range.go

krishnamd-jkp · 2026-01-06T10:48:26Z

storage/grpc_reader_multi_range.go

+func (m *multiRangeDownloaderManager) handleStreamEnd(result mrdSessionResult) {
+	m.currentSession = nil
+	err := result.err
+
+	if result.redirect != nil {
+		m.readSpec.RoutingToken = result.redirect.RoutingToken
+		m.readSpec.ReadHandle = result.redirect.ReadHandle
+		if ensureErr := m.ensureSession(m.ctx); ensureErr != nil {
+			if !m.isRetryable(ensureErr) {
+				m.permanentErr = ensureErr
+				m.attrsOnce.Do(func() { close(m.attrsReady) })
+				m.failAllPending(m.permanentErr)
+			}
+		}
+	} else if m.isRetryable(err) {
+		if len(m.pendingRanges) > 0 {
+			if ensureErr := m.ensureSession(m.ctx); ensureErr != nil {
+				if !m.isRetryable(ensureErr) {
+					m.permanentErr = ensureErr
+					m.attrsOnce.Do(func() { close(m.attrsReady) })
+					m.failAllPending(m.permanentErr)
+				}
+			}
+		}
+	} else {
+		if !errors.Is(err, context.Canceled) && !errors.Is(err, errClosed) {
+			if m.permanentErr == nil {
+				m.permanentErr = err
+			}
+		} else if m.permanentErr == nil {
+			m.permanentErr = errClosed
+		}
+		m.failAllPending(m.permanentErr)
+		m.attrsOnce.Do(func() { close(m.attrsReady) })
+	}
+}


I agree. I think there is a bit of code duplication here. Can be simplified a bit I think.

storage/grpc_reader_multi_range.go

storage/integration_test.go

storage/grpc_reader_multi_range.go

tritone requested review from a team as code owners December 30, 2025 06:45

product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Dec 30, 2025

tritone force-pushed the mrd-refactor-full branch from fa95482 to bf70b70 Compare December 30, 2025 06:48

gemini-code-assist bot reviewed Dec 30, 2025

View reviewed changes

refactor(storage): refactor MultiRangeDownloader

34ac27c

Complete rewrite of storage.MultiRangeDownloader. The new design should be more resilient to concurrency issues, deadlocks, retries, etc.

tritone force-pushed the mrd-refactor-full branch from bf70b70 to 34ac27c Compare December 30, 2025 06:51

tritone added 2 commits December 30, 2025 06:57

code review suggestion

8d627ad

Merge branch 'main' into mrd-refactor-full

51be46b

BrennaEpp reviewed Jan 1, 2026

View reviewed changes

krishnamd-jkp reviewed Jan 5, 2026

View reviewed changes

fix vet issues and review comments

cc26ace

krishnamd-jkp reviewed Jan 6, 2026

View reviewed changes

storage/grpc_reader_multi_range.go Show resolved Hide resolved

cpriti-os requested changes Jan 6, 2026

View reviewed changes

storage/grpc_reader_multi_range.go Outdated Show resolved Hide resolved

krishnamd-jkp reviewed Jan 6, 2026

View reviewed changes

storage/grpc_reader_multi_range.go Outdated Show resolved Hide resolved

krishnamd-jkp reviewed Jan 6, 2026

View reviewed changes

storage/grpc_reader_multi_range.go Outdated Show resolved Hide resolved

krishnamd-jkp reviewed Jan 7, 2026

View reviewed changes

storage/grpc_reader_multi_range.go Show resolved Hide resolved

tritone added 9 commits January 7, 2026 07:03

review fixes

6a45553

support zero copy and enable codec test

b1ee644

more review comments

aa159fb

review comments

0183c85

add more comments

e673ae7

remove attrsReady duplicates

7a8bb0a

more review fixes

c6ba22a

don't add bad offsets

c313ac3

simplify error handling

849f7ef

tritone added 2 commits January 7, 2026 12:18

fix redirects

a85944d

re-add ctx err check to fix flake

8c27cb4

cpriti-os previously approved these changes Jan 7, 2026

View reviewed changes

Set DOCKER_API_VERSION for emulator test script

3d5d58f

cpriti-os dismissed their stale review via 3d5d58f January 7, 2026 14:32

cpriti-os approved these changes Jan 7, 2026

View reviewed changes

BrennaEpp approved these changes Jan 7, 2026

View reviewed changes

tritone merged commit 1cfd100 into googleapis:main Jan 7, 2026
10 checks passed

krishnamd-jkp mentioned this pull request Jan 9, 2026

chore: librarian release pull request: 20260109T061244Z #13568

Open

refactor(storage): refactor MultiRangeDownloader #13524

refactor(storage): refactor MultiRangeDownloader #13524

Uh oh!

Conversation

tritone commented Dec 30, 2025

Uh oh!

gemini-code-assist bot commented Dec 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BrennaEpp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects