Skip to content

feat(Storage): Enable full object checksum validation for multi chunk resumable uploads#10

Open
mahendra-google wants to merge 9 commits intomainfrom
enable-json-full-object-checksums
Open

feat(Storage): Enable full object checksum validation for multi chunk resumable uploads#10
mahendra-google wants to merge 9 commits intomainfrom
enable-json-full-object-checksums

Conversation

@mahendra-google
Copy link
Copy Markdown
Owner

@mahendra-google mahendra-google commented Jan 16, 2026

This PR enables full object checksum validation (specifically CRC32C) for JSON-based resumable uploads, ensuring end-to-end data integrity for multi-chunk transfers. Please see b/461996128

@mahendra-google mahendra-google changed the title feat: Enable full object checksum validation for multi chink resumable uploads feat: Enable full object checksum validation for multi chunk resumable uploads Jan 16, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @mahendra-google, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness of multi-chunk resumable uploads by integrating full object CRC32c checksum validation. By introducing a dedicated helper for checksum calculation and an HTTP interceptor that applies this checksum to the final upload request, the changes ensure that data integrity is maintained throughout the entire upload process, providing greater reliability for large file transfers to Google Cloud Storage.

Highlights

  • CRC32c Calculation Utility: A new Crc32cHelper class has been introduced to efficiently calculate CRC32c checksums for Stream objects, handling stream positioning and buffer management.
  • Resumable Upload Checksum Interception: An IHttpExecuteInterceptor named Crc32cHashInterceptor is now used in CustomMediaUpload to automatically compute and attach the CRC32c checksum to the x-goog-hash header for the final chunk of multi-chunk resumable uploads.
  • Enhanced Stream Handling: The internal LengthOnlyStream has been improved to correctly support seeking and reading operations, which is essential for re-reading stream data to calculate checksums.
  • New Dependency: The project now includes a dependency on System.IO.Hashing to leverage its Crc32 implementation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces checksum validation for multi-chunk resumable uploads by calculating the CRC32C hash of the entire object and sending it with the final chunk. The implementation adds a new Crc32cHelper class, a CustomMediaUpload class with an HTTP interceptor, and modifies LengthOnlyStream.

My review has identified several critical issues that need to be addressed. There's a memory leak where the HTTP interceptor is never removed, a bug that causes an incorrect checksum to be calculated for non-seekable streams (which will cause uploads to fail), and a bug in the LengthOnlyStream implementation. I've also included suggestions to improve robustness and clarity.

Comment on lines +39 to +42
_stream = stream;
_crc32c = body.Crc32c;
_interceptor = new Crc32cHashInterceptor(_stream, _crc32c);
service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The CRC32C hash calculation is performed on the stream provided to CustomMediaUpload. When the original source stream for an upload is not seekable, a LengthOnlyStream is used to initiate the resumable upload. This LengthOnlyStream is then passed to CustomMediaUpload and its interceptor. The interceptor will then attempt to calculate a CRC32C hash on this dummy stream (which contains only zeros), resulting in an incorrect hash. This will cause the server-side validation to fail. The hash calculation should only be performed when the actual content stream is available and is seekable.

_stream = stream;
_crc32c = body.Crc32c;
_interceptor = new Crc32cHashInterceptor(_stream, _crc32c);
service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The Crc32cHashInterceptor is added to the HttpClient's message handler but is never removed. Since the HttpClient and its handler are typically long-lived and reused, this will cause the interceptor to persist beyond the scope of this single upload. This is a memory leak and can lead to incorrect behavior on subsequent requests. The interceptor should be responsible for removing itself from the handler's interceptor list after it has served its purpose (i.e., after processing the final chunk).

To fix this, you can pass the ConfigurableMessageHandler to the Crc32cHashInterceptor's constructor. The interceptor can then store this reference and use it to remove itself from the ExecuteInterceptors list in InterceptAsync after it has added the hash header for the final chunk.

long? remaining = _length - _position;
if (remaining <= 0) return 0;

int toRead = (int) Math.Min(count, (byte) remaining);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a bug in the calculation of toRead. The remaining variable is a long?. Casting it to byte via (byte) remaining will cause it to truncate and wrap around for stream lengths greater than 255 bytes. This will result in an incorrect number of bytes being "read" from this simulated stream. You should use remaining.Value and perform the Math.Min operation with compatible types.

                int toRead = (int) Math.Min((long)count, remaining.Value);

Comment on lines +76 to +92
private bool IsFinalChunk(string rangeHeader)
{
try
{
string parts = rangeHeader.Replace("bytes ", "");
string[] split = parts.Split('/');
if (split.Length < 2 || split[1] == "*") return false;

long totalSize = long.Parse(split[1]);
string[] range = split[0].Split('-');
long endByte = long.Parse(range[1]);

// If endByte is the last byte of the file, it's the final chunk
return (endByte + 1) == totalSize;
}
catch { return false; }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of IsFinalChunk uses a broad try-catch block and string manipulation that can be fragile. For example, if the range part of the header is *, range[1] will throw an IndexOutOfRangeException, which is then silently caught. It's better to use more robust parsing logic with long.TryParse and more specific string checks to avoid relying on exception handling for control flow.

            private bool IsFinalChunk(string rangeHeader)
            {
                // Expected format: "bytes <start>-<end>/<total>" or "bytes */<total>"
                if (!rangeHeader.StartsWith("bytes "))
                {
                    return false;
                }

                string[] parts = rangeHeader.Substring(6).Split('/');
                if (parts.Length != 2)
                {
                    return false;
                }

                string range = parts[0];
                string total = parts[1];

                if (range == "*" || total == "*" || !long.TryParse(total, out long totalSize))
                {
                    return false;
                }

                string[] rangeParts = range.Split('-');
                if (rangeParts.Length != 2 || !long.TryParse(rangeParts[1], out long endByte))
                {
                    return false;
                }

                return endByte + 1 == totalSize;
            }

{
private const int Crc32cCalculationBufferSize = 81920;

public static string CalculateCrc32cAsync(Stream stream)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method CalculateCrc32cAsync is named as if it were asynchronous, but it performs synchronous I/O (stream.Read). This is misleading and can cause confusion. Please rename it to CalculateCrc32c to reflect its synchronous nature. You will also need to update the call site in Crc32cHashInterceptor.

    public static string CalculateCrc32c(Stream stream)

private class Crc32cHashInterceptor : IHttpExecuteInterceptor
{
private string _calculatedHash;
private readonly string _crc32c;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The field _crc32c is initialized in the Crc32cHashInterceptor constructor but its value is never used. The interceptor always calculates a new hash. If the intention is to use a pre-computed hash when provided, this logic is missing. If it's not intended to be used, it should be removed to avoid confusion and dead code.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces checksum validation for multi-chunk resumable uploads. The overall approach is sound, but there are a few critical issues that need to be addressed. The use of IHttpExecuteInterceptor is problematic as it's not being removed, leading to potential issues in subsequent requests. Additionally, the changes to LengthOnlyStream have introduced bugs that could cause runtime exceptions. I've provided detailed comments and suggestions for these critical issues, along with some other feedback to improve code quality and maintainability.

Comment on lines 35 to 43
public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
Stream stream, string contentType)
: base(service, body, bucket, stream, contentType)
{
_stream = stream;
_crc32c = body.Crc32c;
_interceptor = new Crc32cHashInterceptor(_stream, _crc32c);
service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The Crc32cHashInterceptor is added to the service.HttpClient.MessageHandler, which is a long-lived, shared component. However, the interceptor is never removed after the upload completes. This means it will persist for the lifetime of the HttpClient, intercepting all subsequent HTTP requests, not just for this specific upload. This can lead to incorrect behavior, especially if other uploads are performed, as the interceptor holds a reference to the stream of the first upload.

A much safer approach is to use Options.ModifyRequest, which is scoped to a single upload operation and avoids modifying shared state. The logic from the interceptor can be moved into a lambda assigned to Options.ModifyRequest within the CustomMediaUpload constructor.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Can you provide suggestions for this?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, @mahendra-google! I can provide a suggestion for this. As discussed, the current approach of adding a Crc32cHashInterceptor to the shared HttpClient.MessageHandler can lead to issues due to its long-lived nature.

Due to the strict line number constraints for inline code suggestions, I've provided a placeholder in the suggestion field below that removes the problematic interceptor registration. For the complete and correct implementation using Options.ModifyRequest, please refer to the detailed explanation and code block here.

To fully address the issue, you should make the following changes:

  1. Remove the _interceptor field from the CustomMediaUpload class (currently on line 34).
  2. Update the CustomMediaUpload constructor to use Options.ModifyRequest as shown below.
  3. Add the IsFinalChunk helper method as a private static method within the CustomMediaUpload class.
  4. Remove the entire Crc32cHashInterceptor class definition (currently from line 47 to 93).

Here is the complete suggested structure for your CustomMediaUpload class:

internal sealed class CustomMediaUpload : InsertMediaUpload
{
    private readonly Stream _stream;
    private readonly string _crc32c;
    // REMOVE THIS FIELD: private readonly Crc32cHashInterceptor _interceptor;

    public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
        Stream stream, string contentType)
        : base(service, body, bucket, stream, contentType)
    {
        _stream = stream;
        _crc32c = body.Crc32c;

        Options.ModifyRequest += (request) =>
        {
            if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") == true)
            {
                string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();

                if (IsFinalChunk(rangeHeader))
                {
                    string calculatedHash = Crc32cHelper.CalculateCrc32cAsync(_stream);
                    request.Headers.TryAddWithoutValidation("x-goog-hash", $"crc32c={calculatedHash}");
                }
            }
        };
    }

    internal new ResumableUploadOptions Options => base.Options;

    private static bool IsFinalChunk(string rangeHeader)
    {
        try
        {
            string parts = rangeHeader.Replace("bytes ", "");
            string[] split = parts.Split('/');
            if (split.Length < 2 || split[1] == "*") return false;

            long totalSize = long.Parse(split[1]);
            string[] range = split[0].Split('-');
            long endByte = long.Parse(range[1]);

            // If endByte is the last byte of the file, it's the final chunk
            return (endByte + 1) == totalSize;
        }
        catch { return false; }
    }

    // REMOVE THE ENTIRE Crc32cHashInterceptor CLASS DEFINITION FROM HERE
}

This approach ensures that the CRC32c hash calculation and header addition are scoped correctly to each upload operation, preventing unintended side effects on other HTTP requests.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Options.ModifyRequest is not available?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to point that out, @mahendra-google! My apologies for the oversight in the previous suggestion. The ModifyRequest property on ResumableUploadOptions is an Action<HttpRequestMessage>, not an event. Therefore, it should be assigned using the = operator, not +=.

This is a common point of confusion, as Action delegates can sometimes be used with += if they are declared as events. However, as a public property, direct assignment is required.

I've updated the code suggestion below to reflect this correction. Please apply this change to your CustomMediaUpload constructor.

Additionally, please remember the other crucial steps from my previous comment to fully resolve the issue:

  1. Remove the _interceptor field from the CustomMediaUpload class (currently on line 34).
  2. Add the IsFinalChunk helper method as a private static method within the CustomMediaUpload class.
  3. Remove the entire Crc32cHashInterceptor class definition (currently from line 47 to 93).

These steps, combined with the corrected constructor, will ensure the CRC32c hash calculation is applied correctly and without side effects on other HTTP requests.

Suggested change
public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
Stream stream, string contentType)
: base(service, body, bucket, stream, contentType)
{
_stream = stream;
_crc32c = body.Crc32c;
_interceptor = new Crc32cHashInterceptor(_stream, _crc32c);
service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
}
public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
Stream stream, string contentType)
: base(service, body, bucket, stream, contentType)
{
_stream = stream;
_crc32c = body.Crc32c;
Options.ModifyRequest = (request) =>
{
if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") == true)
{
string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();
if (IsFinalChunk(rangeHeader))
{
string calculatedHash = Crc32cHelper.CalculateCrc32cAsync(_stream);
request.Headers.TryAddWithoutValidation("x-goog-hash", $"crc32c={calculatedHash}");
}
}
};
}

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Resumable upload options doesn't have ModifyRequest at all?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely correct, @mahendra-google. I apologize for the continued errors. It seems I am having trouble keeping the API surface for ResumableUploadOptions straight.

I will need to retract my previous suggestions, as it appears there is no direct way to modify the HttpRequestMessage using the ResumableUploadOptions. The initial approach of using an IHttpExecuteInterceptor is closer to the correct solution, but as I mentioned before, it needs to be scoped correctly to avoid affecting other requests.

Given these constraints, I recommend exploring alternative approaches to achieve the desired CRC32C validation for multi-chunk resumable uploads. I don't have a specific code suggestion at this time, but I can offer some general guidance:

  1. Re-evaluate the necessity of modifying the request directly: Instead of adding the x-goog-hash header, consider if the CRC32C validation can be performed on the server-side after the upload is complete. This would eliminate the need for an interceptor or request modification.
  2. If request modification is unavoidable, explore creating a custom HttpClient instance: Instead of modifying the shared HttpClient, create a new HttpClient instance specifically for the resumable upload. This would allow you to add the Crc32cHashInterceptor without affecting other requests. Make sure to dispose of this HttpClient instance after the upload is complete.
  3. Consider using a different part of the API: There may be other extension points in the Google Cloud Storage API that would allow you to add the CRC32C hash without directly modifying the request.

I apologize again for the previous incorrect suggestions. I am still under development, and I am always learning to improve my accuracy and understanding of different APIs.

Comment on lines +253 to +266
public override int Read(byte[] buffer, int offset, int count)
{
// Calculate how many bytes we can actually "read" (simulate)
long? remaining = _length - _position;
if (remaining <= 0) return 0;

int toRead = (int) Math.Min(count, (byte) remaining);

// Fill buffer with Zeros (or valid dummy data)
Array.Clear(buffer, offset, toRead);

_position += toRead;
return toRead;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The Read method implementation in LengthOnlyStream has two critical issues:

  1. If _length is null, remaining will also be null. The expression (byte) remaining on line 259 will then throw an InvalidOperationException. The method must handle the case where _length is null.
  2. The cast (byte) remaining on line 259 is incorrect. remaining is a long? representing the bytes left in the stream, which can be much larger than 255 (the max value of a byte). This cast will cause an overflow and lead to an incorrect number of bytes being "read", which will likely break any component using this stream.
            public override int Read(byte[] buffer, int offset, int count)
            {
                if (!_length.HasValue)
                {
                    return 0;
                }

                // Calculate how many bytes we can actually "read" (simulate)
                long remaining = _length.Value - _position;
                if (remaining <= 0) return 0;

                int toRead = (int)Math.Min(count, remaining);

                // Fill buffer with Zeros (or valid dummy data)
                Array.Clear(buffer, offset, toRead);

                _position += toRead;
                return toRead;
            }

public override bool CanRead => throw new NotImplementedException();
public override bool CanWrite => throw new NotImplementedException();
public override long Length => _length ?? -1;
public override bool CanSeek => true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

CanSeek is unconditionally true, but the stream's logic is not fully functional when _length is null. A seekable stream is generally expected to have a known length. The previous implementation (CanSeek => _length.HasValue;) was safer. If CanSeek must be true, then the rest of the class needs to be robust enough to handle a null _length without throwing exceptions, which is not the case currently (as noted in the Read method). This inconsistency can lead to runtime errors.

{
private const int Crc32cCalculationBufferSize = 81920;

public static string CalculateCrc32cAsync(Stream stream)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method CalculateCrc32cAsync is not asynchronous and does not return a Task. According to .NET naming conventions, methods that are not asynchronous should not have the Async suffix. This can be misleading to developers using this helper.

    public static string CalculateCrc32c(Stream stream)

Comment on lines +49 to +58
private string _calculatedHash;
private readonly string _crc32c;
private readonly string _googleHashHeader = "x-goog-hash";
private readonly Stream _stream;

public Crc32cHashInterceptor(Stream stream, string crc32c)
{
_stream = stream;
_crc32c = crc32c;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a couple of areas for improvement in Crc32cHashInterceptor:

  1. The _crc32c field is initialized in the constructor but its value is never used. This unused parameter and field should be removed to simplify the code.
  2. The _calculatedHash field is only used as a temporary variable within InterceptAsync. It should be declared as a local variable within that method's scope.

These changes will improve code clarity and reduce the state of the interceptor.

// If endByte is the last byte of the file, it's the final chunk
return (endByte + 1) == totalSize;
}
catch { return false; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The generic catch block hides all potential errors that might occur while parsing the rangeHeader. This can make debugging difficult if an unexpected error occurs. It's better to catch only the specific exceptions you anticipate, such as FormatException, OverflowException from long.Parse, and IndexOutOfRangeException from array access. This makes the code more robust and maintainable.

                catch (Exception e) when (e is FormatException || e is OverflowException || e is IndexOutOfRangeException)
                {
                    return false;
                }

@mahendra-google mahendra-google changed the title feat: Enable full object checksum validation for multi chunk resumable uploads feat(Storage): Enable full object checksum validation for multi chunk resumable uploads Jan 19, 2026
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The changes primarily involve updating integration tests to reflect new server-side validation behavior and implementing an HTTP interceptor to add the checksum header on the final upload chunk.

I've found a few critical issues in the implementation. The checksum calculation logic in CustomMediaUpload doesn't await an async operation, leading to incorrect checksums. It also attempts to load the entire file into memory, which would cause OutOfMemoryException for large files. Additionally, there's a memory leak as the created interceptor is never removed. In LengthOnlyStream, the Read method has a bug that would cause a NullReferenceException or incorrect behavior due to a wrong type cast.

I've also noted some issues in the updated tests, including a missing assertion and some opportunities to improve code clarity. Please see my detailed comments for suggestions on how to address these points.

{
_stream = stream;
_interceptor = new Crc32cHashInterceptor(_stream);
service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Crc32cHashInterceptor is added to the HttpClient's message handler but is never removed. Since the HttpClient is typically reused for the lifetime of the StorageClient, this will cause a memory leak: a new interceptor will be added for every resumable upload and will never be garbage collected, as it's referenced by the message handler. This will also cause performance degradation over time as more interceptors are executed for each HTTP request.

AddExecuteInterceptor returns an IDisposable which should be used to unregister the interceptor. However, the CustomMediaUpload class doesn't have a mechanism (like implementing IDisposable) to manage the lifetime of the interceptor. This design needs to be reconsidered to ensure interceptors are cleaned up after an upload is complete.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist any suggestions to clean up interceptor after upload is complete?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to address the memory leak and ensure proper cleanup, you can make CustomMediaUpload implement IDisposable. The AddExecuteInterceptor method returns an IDisposable object which, when disposed, removes the interceptor from the message handler.

Here's how you can modify the CustomMediaUpload class:

  1. Implement IDisposable: Add IDisposable to the class declaration.
  2. Store the registration: Store the IDisposable returned by service.HttpClient.MessageHandler.AddExecuteInterceptor in a private field.
  3. Implement Dispose(): In the Dispose() method, call Dispose() on the stored IDisposable to unregister the interceptor.

This change will allow the calling code (e.g., StorageClientImpl.UploadObject.cs) to wrap the CustomMediaUpload instance in a using statement, ensuring the interceptor is properly removed after the upload operation completes.

Here's a code suggestion for the CustomMediaUpload class:

Suggested change
service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
internal sealed class CustomMediaUpload : InsertMediaUpload, IDisposable
{
private readonly Stream _stream;
private readonly Crc32cHashInterceptor _interceptor;
private readonly IDisposable _interceptorRegistration;
public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
Stream stream, string contentType)
: base(service, body, bucket, stream, contentType)
{
_stream = stream;
_interceptor = new Crc32cHashInterceptor(_stream);
_interceptorRegistration = service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor);
}
public void Dispose()
{
_interceptorRegistration?.Dispose();
}

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini-review

2 similar comments
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini-review

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini-review

@gemini-code-assist
Copy link
Copy Markdown

It looks like you're trying to request a code review. The correct command is /gemini review. Please use that command if you'd like me to perform a code review.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The changes in the integration tests correctly reflect the new server-side validation behavior.

My review focuses on the implementation in CustomMediaUpload.cs and StorageClientImpl.UploadObject.cs. I've found a critical issue in Crc32cHashInterceptor that would break uploads for non-seekable streams. I've also identified a high-severity issue in LengthOnlyStream where it violates the Stream contract, potentially leading to runtime errors. Additionally, I've included a few medium-severity suggestions to improve code clarity and maintainability. Please see the detailed comments.

Comment thread apis/Google.Cloud.Storage.V1/Google.Cloud.Storage.V1/CustomMediaUpload.cs Outdated
Comment thread apis/Google.Cloud.Storage.V1/Google.Cloud.Storage.V1/CustomMediaUpload.cs Outdated
Comment thread apis/Google.Cloud.Storage.V1/Google.Cloud.Storage.V1/CustomMediaUpload.cs Outdated
Comment thread apis/Google.Cloud.Storage.V1/Google.Cloud.Storage.V1/CustomMediaUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The implementation introduces a Crc32cHashInterceptor to add the x-goog-hash header on the final upload chunk. My review has identified a critical potential resource leak with this new interceptor, which could lead to memory issues if uploads are cancelled or fail. I've also suggested a performance improvement for buffer handling and pointed out some test cases that have become redundant due to the new server-side validation behavior. Addressing the resource leak is highly recommended.

_stream = stream;
_service = service;
_interceptor = new Crc32cHashInterceptor(_stream, _service);
_service?.HttpClient?.MessageHandler?.AddExecuteInterceptor(_interceptor);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There appears to be a potential memory and resource leak with the Crc32cHashInterceptor. The interceptor is added to the MessageHandler in the constructor but is only removed when the final chunk of a resumable upload is processed.

If an upload is cancelled, fails before completion, or is a single-part upload (which doesn't trigger the resumable upload logic with Content-Range), the interceptor will not be removed. It will remain in the MessageHandler's list of interceptors for the lifetime of the HttpClient, unnecessarily inspecting all subsequent requests and preventing the CustomMediaUpload and its associated stream from being garbage collected.

To fix this, CustomMediaUpload should implement IDisposable (by overriding the Dispose(bool) method from its base class ResumableUpload) and remove the interceptor in the Dispose method.

protected override void Dispose(bool disposing)
{
    if (disposing)
    {
        _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(_interceptor);
    }
    base.Dispose(disposing);
}

This would require the calling code (in StorageClientImpl.Uploader) to dispose of the CustomMediaUpload instance, for example by wrapping its creation in a using statement. While that file is not part of this PR's changes, this leak is critical and should be addressed.

Comment on lines +312 to +313
var exception = Assert.Throws<GoogleApiException>(() => client.UploadObject(bucket, name, null, stream, options));
Assert.Equal(HttpStatusCode.BadRequest, exception.HttpStatusCode);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the new server-side checksum validation, this test UploadObject_InvalidHash_DeleteAndThrow_DeleteFails seems to have become redundant. The client-side logic for deleting a corrupt object is no longer triggered, so the FailingDeleteGoogleClient used in this test's setup is never exercised. The test body is now identical to UploadObject_InvalidHash_None and UploadObject_InvalidHash_ThrowOnly.

Consider removing this test and its async counterpart to reduce redundancy and avoid confusion, as the test name is now misleading.

Comment on lines +372 to +373
var exception = await Assert.ThrowsAsync<GoogleApiException>(() => client.UploadObjectAsync(bucket, name, null, stream, options));
Assert.Equal(HttpStatusCode.BadRequest, exception.HttpStatusCode);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to its synchronous counterpart, this async test UploadObjectAsync_InvalidHash_DeleteAndThrow_DeleteFails appears to be redundant now. The server-side validation prevents the client-side delete logic from being called, making the test's special setup with FailingDeleteGoogleClient ineffective. The test implementation is now identical to other invalid hash tests.

Consider removing this test to improve test suite clarity and maintainability.

Comment on lines +81 to +86
var buffer = new byte[81920];
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve performance and reduce pressure on the garbage collector, especially for large streams, consider using ArrayPool<byte>.Shared to pool the buffer used for hashing. This avoids allocating a new 80KB buffer on the heap for every final chunk calculation, which can be particularly beneficial if it ends up on the Large Object Heap.

                            var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(81920);
                            try
                            {
                                int bytesRead;
                                while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
                                {
                                    hasher.UpdateHash(buffer, 0, bytesRead);
                                }
                            }
                            finally
                            {
                                System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
                            }

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation (CRC32C) for multi-chunk resumable uploads by introducing an HTTP interceptor that calculates the hash of the full stream and adds it to the final upload request. This is a great feature for ensuring end-to-end data integrity. The integration tests have been updated accordingly to reflect the new server-side validation behavior, which now results in a BadRequest for mismatched checksums. Additionally, a fix for LengthOnlyStream is included to make it more compliant with the Stream contract.

My review includes a couple of suggestions: one to fix a potential issue in the LengthOnlyStream implementation for non-seekable streams, and another to improve code clarity by replacing a magic number with a named constant. Overall, this is a solid contribution.

Comment thread apis/Google.Cloud.Storage.V1/Google.Cloud.Storage.V1/CustomMediaUpload.cs Outdated
Comment on lines +247 to +250
{
if (value < 0 || value > _length) throw new ArgumentOutOfRangeException();
_position = value;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Position setter doesn't correctly handle the case where the stream is not seekable (when _length is null). According to the Stream contract, attempting to set the position on a non-seekable stream should throw a NotSupportedException. The current implementation fails to do this, which could lead to unexpected behavior.

Please update the setter to enforce this contract by checking CanSeek.

                {
                    if (!CanSeek) throw new NotSupportedException();
                    if (value < 0 || value > Length) throw new ArgumentOutOfRangeException(nameof(value));
                    _position = value;
                }

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist Changes are made

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update, mahendra-google. I appreciate you addressing the comment.

However, it appears the Position setter still doesn't fully adhere to the Stream contract for non-seekable streams. The if (!CanSeek) throw new NotSupportedException(); check is still missing. This check is crucial to ensure that attempting to set the position on a stream that doesn't support seeking correctly throws a NotSupportedException.

Please ensure this check is included to fully comply with the Stream contract.

                {
                    if (!CanSeek) throw new NotSupportedException();
                    if (value < 0 || value > Length) throw new ArgumentOutOfRangeException(nameof(value));
                    _position = value;
                }
Suggested change
{
if (value < 0 || value > _length) throw new ArgumentOutOfRangeException();
_position = value;
}
{
if (!CanSeek) throw new NotSupportedException();
if (value < 0 || value > Length) throw new ArgumentOutOfRangeException(nameof(value));
_position = value;
}

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces end-to-end CRC32C checksum validation for multi-chunk resumable uploads. The changes include a new Crc32cHashInterceptor to inject the checksum header on the final upload chunk, updates to integration tests to reflect the new server-side validation behavior, and improvements to the LengthOnlyStream mock. The implementation is mostly solid, but I've identified a potential issue with the lifecycle of the interceptor that could affect upload retries. My feedback focuses on making this new feature more robust.

Comment on lines +75 to +109
if (!_stream.CanSeek)
{
_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);
return;
}

long originalPosition = _stream.Position;
try
{
_stream.Position = 0;

var hasher = new Crc32c();
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
}
finally
{
System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
}

byte[] hash = hasher.GetHash();
string calculatedHash = Convert.ToBase64String(hash);
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
}
finally
{
_stream.Position = originalPosition;
_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The interceptor removes itself after its first execution on a final chunk. This is problematic if the upload of that chunk fails due to a transient error and needs to be retried by the resumable upload mechanism. On a retry attempt, the interceptor would be gone, and the x-goog-hash header would not be added. This could lead to the upload failing validation or succeeding without the end-to-end integrity check this feature is designed to provide.

The interceptor should remain active for the lifetime of the CustomMediaUpload object and be cleaned up only in the Dispose method. The IsFinalChunk check is sufficient to ensure it only acts on the appropriate request.

                        if (!_stream.CanSeek)
                        {
                            return;
                        }

                        long originalPosition = _stream.Position;
                        try
                        {
                            _stream.Position = 0;

                            var hasher = new Crc32c();
                            var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
                            try
                            {
                                int bytesRead;
                                while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
                                {
                                    hasher.UpdateHash(buffer, 0, bytesRead);
                                }
                            }
                            finally
                            {
                                System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
                            }

                            byte[] hash = hasher.GetHash();
                            string calculatedHash = Convert.ToBase64String(hash);
                            request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
                        }
                        finally
                        {
                            _stream.Position = originalPosition;
                        }

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great enhancement, enabling full object CRC32C checksum validation for resumable uploads to ensure end-to-end data integrity. The implementation using an IHttpExecuteInterceptor to add the x-goog-hash header on the final chunk is clever. The updates to the integration tests to reflect the new server-side validation behavior are also correct.

I have a couple of suggestions to improve the robustness of the implementation: one in CustomMediaUpload.cs to make the header parsing more resilient, and another in StorageClientImpl.UploadObject.cs to fix a potential null reference issue in the LengthOnlyStream.

Comment on lines +246 to +250
set
{
if (value < 0 || value > _length) throw new ArgumentOutOfRangeException();
_position = value;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Position setter has a potential InvalidOperationException if _length is null. The comparison value > _length will try to access _length.Value, which fails when _length is null. You should first check if seeking is supported (i.e., if _length has a value). It's also good practice to provide the parameter name to ArgumentOutOfRangeException.

                set
                {
                    if (!_length.HasValue)
                    {
                        throw new NotSupportedException();
                    }
                    if (value < 0 || value > _length.Value)
                    {
                        throw new ArgumentOutOfRangeException(nameof(value));
                    }
                    _position = value;
                }

Comment on lines +109 to +143
private bool IsFinalChunk(string rangeHeader)
{
// Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request.
// We are interested in the final chunk of a known-size upload.
const string prefix = "bytes ";
if (!rangeHeader.StartsWith(prefix))
{
return false;
}

string[] parts = rangeHeader.Substring(prefix.Length).Split('/');
if (parts.Length != 2 || parts[1] == "*")
{
return false;
}

if (!long.TryParse(parts[1], out long totalSize))
{
return false;
}

string[] range = parts[0].Split('-');
if (range.Length != 2)
{
return false;
}

if (!long.TryParse(range[1], out long endByte))
{
return false;
}

// If endByte is the last byte of the file, it's the final chunk.
return (endByte + 1) == totalSize;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of IsFinalChunk can be made more robust:

  1. The Content-Range header's bytes unit could be case-insensitive according to HTTP specifications. It's safer to use StringComparison.OrdinalIgnoreCase in the StartsWith check.
  2. When parsing numbers from the header, it's a best practice to use long.TryParse with CultureInfo.InvariantCulture to avoid potential issues with different system locales.

Here's a suggested update incorporating these improvements:

            private bool IsFinalChunk(string rangeHeader)
            {
                // Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request.
                // We are interested in the final chunk of a known-size upload.
                const string prefix = "bytes ";
                if (!rangeHeader.StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
                {
                    return false;
                }

                string[] parts = rangeHeader.Substring(prefix.Length).Split('/');
                if (parts.Length != 2 || parts[1] == "*")
                {
                    return false;
                }

                if (!long.TryParse(parts[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
                {
                    return false;
                }

                string[] range = parts[0].Split('-');
                if (range.Length != 2)
                {
                    return false;
                }

                if (!long.TryParse(range[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
                {
                    return false;
                }

                // If endByte is the last byte of the file, it's the final chunk.
                return (endByte + 1) == totalSize;
            }

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR enables full object checksum validation for resumable uploads, which is a great data integrity improvement. The changes in the integration tests correctly reflect the new server-side validation behavior. The implementation of the Crc32cHashInterceptor is mostly correct, but has a significant flaw related to concurrent uploads that could cause uploads to fail. I've provided detailed comments and suggestions to address this issue by making the interceptor session-aware. The changes to LengthOnlyStream are well-implemented and make the class more robust.

{
_stream = stream;
_service = service;
_interceptor = new Crc32cHashInterceptor(_stream, _service);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To prevent race conditions with concurrent uploads, the interceptor needs to be aware of the CustomMediaUpload instance it belongs to. Please pass this to the Crc32cHashInterceptor constructor. This is part of a larger fix that will be detailed in another comment on the Crc32cHashInterceptor class itself.

            _interceptor = new Crc32cHashInterceptor(this, _stream, _service);

Comment on lines +49 to +107
private sealed class Crc32cHashInterceptor : IHttpExecuteInterceptor
{
private const string GoogleHashHeader = "x-goog-hash";
private const int ReadBufferSize = 81920;
private readonly Stream _stream;
private readonly IClientService _service;

public Crc32cHashInterceptor(Stream stream, IClientService service)
{
_stream = stream;
_service = service;
}

public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") is true)
{
string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();

if (IsFinalChunk(rangeHeader))
{
if (!_stream.CanSeek)
{
_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);
return;
}

long originalPosition = _stream.Position;
try
{
_stream.Position = 0;

var hasher = new Crc32c();
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
}
finally
{
System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
}

byte[] hash = hasher.GetHash();
string calculatedHash = Convert.ToBase64String(hash);
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
}
finally
{
_stream.Position = originalPosition;
_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of Crc32cHashInterceptor can lead to incorrect behavior when multiple uploads are performed concurrently on the same StorageClient instance. All active interceptors are triggered for every chunk upload, which can cause a checksum from one upload to be attached to a request for another, leading to upload failures.

To resolve this, the interceptor must verify that it's acting on a request for its own upload session. This can be done by storing the "owner" CustomMediaUpload instance and comparing the request's URI with the owner's UploadUri.

The interceptor should also remove itself only after its own final chunk has been processed. The check for the upload URI ensures this.

        private sealed class Crc32cHashInterceptor : IHttpExecuteInterceptor
        {
            private const string GoogleHashHeader = "x-goog-hash";
            private const int ReadBufferSize = 81920;
            private readonly CustomMediaUpload _owner;
            private readonly Stream _stream;
            private readonly IClientService _service;

            public Crc32cHashInterceptor(CustomMediaUpload owner, Stream stream, IClientService service)
            {
                _owner = owner;
                _stream = stream;
                _service = service;
            }

            public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken)
            {
                // Only intercept requests for our own upload session.
                // The UploadUri is set after the initiation request (a POST). Chunk uploads are PUTs.
                // So for any PUT we intercept, UploadUri should be non-null.
                if (_owner.UploadUri != null && request.RequestUri != _owner.UploadUri)
                {
                    return;
                }

                if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") is true)
                {
                    string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();

                    if (IsFinalChunk(rangeHeader))
                    {
                        // This interceptor is for a specific upload. Once we've processed the final chunk
                        // (or decided we can't), we should remove it.
                        _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);

                        if (!_stream.CanSeek)
                        {
                            return;
                        }

                        long originalPosition = _stream.Position;
                        try
                        {
                            _stream.Position = 0;

                            var hasher = new Crc32c();
                            var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
                            try
                            {
                                int bytesRead;
                                while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
                                {
                                    hasher.UpdateHash(buffer, 0, bytesRead);
                                }
                            }
                            finally
                            {
                                System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
                            }

                            byte[] hash = hasher.GetHash();
                            string calculatedHash = Convert.ToBase64String(hash);
                            request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
                        }
                        finally
                        {
                            _stream.Position = originalPosition;
                        }
                    }
                }
            }

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for multi-chunk resumable uploads by adding an HTTP interceptor that calculates and attaches the CRC32C hash to the final upload request. The changes are logical and the accompanying integration test updates correctly reflect the new server-side validation behavior. I've identified a high-severity issue related to event handler management in the new interceptor that could lead to memory leaks, and a medium-severity suggestion to optimize string parsing for better performance. Overall, this is a great feature enhancement for data integrity.


public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
_owner.UploadSessionData += OnSessionData;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are a couple of issues with event handler management here:

  1. Multiple Subscriptions: _owner.UploadSessionData += OnSessionData; is called within InterceptAsync, which can be executed for every request of a resumable upload. This will lead to the same event handler being subscribed multiple times, causing potential memory leaks and redundant executions.
  2. Missing Unsubscription: The event handler is never unsubscribed, leading to a memory leak. The Crc32cHashInterceptor instance will be kept alive by the CustomMediaUpload instance's UploadSessionData event.

Recommendation:

  1. Move the subscription _owner.UploadSessionData += OnSessionData; to the Crc32cHashInterceptor constructor to ensure it's only done once.
  2. Add _owner.UploadSessionData -= OnSessionData; right after _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this); is called in both places where the interceptor is removed (lines 83 and 114).

Comment on lines +126 to +159
{
// Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request.
// We are interested in the final chunk of a known-size upload.
const string prefix = "bytes ";
if (!rangeHeader.StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
{
return false;
}

string[] parts = rangeHeader.Substring(prefix.Length).Split('/');
if (parts.Length != 2 || parts[1] == "*")
{
return false;
}

if (!long.TryParse(parts[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
{
return false;
}

string[] range = parts[0].Split('-');
if (range.Length != 2)
{
return false;
}

if (!long.TryParse(range[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
{
return false;
}

// If endByte is the last byte of the file, it's the final chunk.
return (endByte + 1) == totalSize;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved performance and to avoid string allocations in this potentially hot path, consider parsing the rangeHeader using ReadOnlySpan<char> instead of string.Split(). This avoids intermediate string allocations and should be more efficient.

            {
                // Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request.
                // We are interested in the final chunk of a known-size upload.
                const string prefix = "bytes ";
                if (!rangeHeader.StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
                {
                    return false;
                }

                ReadOnlySpan<char> span = rangeHeader.AsSpan(prefix.Length);
                int slashIndex = span.IndexOf('/');
                if (slashIndex == -1)
                {
                    return false;
                }

                var totalSpan = span.Slice(slashIndex + 1);
                if (totalSpan.IsEmpty || totalSpan[0] == '*' || !long.TryParse(totalSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
                {
                    return false;
                }

                var rangeSpan = span.Slice(0, slashIndex);
                int dashIndex = rangeSpan.IndexOf('-');
                if (dashIndex == -1)
                {
                    return false;
                }

                var endByteSpan = rangeSpan.Slice(dashIndex + 1);
                if (!long.TryParse(endByteSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
                {
                    return false;
                }

                // If endByte is the last byte of the file, it's the final chunk.
                return (endByte + 1) == totalSize;
            }

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature by enabling full object checksum validation for resumable uploads, enhancing data integrity. The implementation involves an IHttpExecuteInterceptor to inject the CRC32C hash into the final upload chunk, along with updates to integration tests to reflect the new server-side validation behavior. The changes are well-structured, but I've identified a critical issue in the lifecycle management of the new interceptor, which could lead to failed retries and resource leaks. My review includes a detailed explanation and a suggested fix for this issue. The other changes in the integration tests and LengthOnlyStream are correct and well-implemented.

Comment thread apis/Google.Cloud.Storage.V1/Google.Cloud.Storage.V1/CustomMediaUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The implementation correctly uses an IHttpExecuteInterceptor to add the x-goog-hash header to the final chunk of the upload. The changes to the integration tests reflect the new server-side validation behavior.

My review includes a few suggestions:

  • The tests for the 'delete fails' scenario are now redundant due to the change in error handling and should probably be removed.
  • A performance improvement for parsing the Content-Range header in IsFinalChunk by using ReadOnlySpan<char> to avoid allocations.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for resumable uploads by introducing an HTTP interceptor to calculate and attach the CRC32C hash to the final upload chunk. The implementation is solid, correctly handling non-seekable streams and managing the interceptor's lifecycle. The associated integration tests have been appropriately updated to reflect the new server-side validation behavior, where invalid checksums result in a BadRequest error. I've included a few suggestions to enhance performance by optimizing URI comparisons within the new interceptor. Overall, this is a well-executed feature addition.

private readonly Stream _stream;
private readonly IClientService _service;
private readonly CustomMediaUpload _owner;
private string UploadUri;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve performance and avoid string allocations in the request hot path, consider changing the type of UploadUri to Uri. This allows for a more efficient comparison in InterceptAsync.

            private Uri UploadUri;

Comment on lines +70 to +73
if (UploadUri != null && request.RequestUri.ToString() != UploadUri)
{
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For performance, it's better to avoid string conversion and comparison of URIs in the InterceptAsync method. This code is on the hot path for all HTTP requests made through this client. Using Uri.Equals for comparison is more efficient and avoids string allocations on every request.

                if (UploadUri != null && !UploadUri.Equals(request.RequestUri))
                {
                    return;
                }


private void OnSessionData(IUploadSessionData data)
{
UploadUri = data.UploadUri.AbsoluteUri;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To support the performance improvement of comparing Uri objects directly, assign the Uri object itself here instead of its string representation.

                UploadUri = data.UploadUri;

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces full object checksum validation (CRC32C) for multi-chunk resumable uploads. This is achieved by integrating a Crc32cHashInterceptor into the CustomMediaUpload process, which calculates and adds the CRC32C hash to the final chunk of an upload. The integration tests have been updated to reflect the new server-side validation behavior, now expecting GoogleApiException with BadRequest status for invalid uploads. Additionally, the LengthOnlyStream has been made more functional, though its Read method currently returns zeros, which could lead to incorrect checksums if used in the validation process.

Comment on lines +265 to +266
// Fill buffer with Zeros (or valid dummy data)
Array.Clear(buffer, offset, toRead);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Read method of LengthOnlyStream fills the buffer with zeros. If this LengthOnlyStream is ever passed to the Crc32cHashInterceptor for CRC32C calculation, it would result in an incorrect hash (a hash of zeros instead of the actual data). This could lead to false positives in checksum validation or unexpected behavior. Clarify the intended use of LengthOnlyStream. If it's meant to be used with checksum validation, its Read method should provide actual data or it should be prevented from being used in such contexts.

Comment on lines +81 to +84
if (!_stream.CanSeek)
{
_owner.UploadSessionData -= OnSessionData;
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Crc32cHashInterceptor only calculates the CRC32C hash if the provided stream is seekable (_stream.CanSeek). If the stream is not seekable, the interceptor unsubscribes from UploadSessionData but not ProgressChanged. This could lead to a resource leak or unexpected behavior if ProgressChanged is still invoked later. Additionally, this means that full object checksum validation will not occur for non-seekable streams, which might be an unintended limitation given the PR's goal of enabling full object checksum validation for multi-chunk resumable uploads.

                        if (!_stream.CanSeek)
                        {
                            _owner.UploadSessionData -= OnSessionData;
                            _owner.ProgressChanged -= OnProgressChanged;
                            return;
                        }

Comment on lines +90 to +100
_stream.Position = 0;

var hasher = new Crc32c();
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The CRC32C hash calculation in InterceptAsync involves seeking the stream to the beginning (_stream.Position = 0) and reading the entire stream. For very large files, this operation could be a performance bottleneck, as it re-reads all data just before sending the final chunk. Consider if there's a more efficient way to calculate the CRC32C for the entire stream, perhaps by performing the calculation incrementally during the initial upload process or by ensuring the stream is buffered in a way that avoids re-reading from the underlying source.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for multi-chunk resumable uploads, which is a significant improvement for data integrity. The core logic is implemented using an IHttpExecuteInterceptor to inject the CRC32C hash into the final upload request. The accompanying test changes correctly reflect the new server-side validation behavior, where invalid uploads are rejected with a BadRequest status. My review focuses on the new implementation in CustomMediaUpload.cs, where I've identified a critical resource leak and have also provided suggestions to enhance performance and maintainability.

Comment on lines 37 to 45
public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
Stream stream, string contentType)
: base(service, body, bucket, stream, contentType)
{
_stream = stream;
_service = service;
_interceptor = new Crc32cHashInterceptor(this, _stream, _service);
_service?.HttpClient?.MessageHandler?.AddExecuteInterceptor(_interceptor);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a potential resource leak here. The Crc32cHashInterceptor is added to the HttpClient in the constructor. However, CustomMediaUpload is also used when only initiating a resumable upload session (via InitiateUploadSessionAsync), not performing the full upload. In that scenario, the uploader object is not used to perform an upload, so its ProgressChanged event never fires, and the interceptor is never removed from the HttpClient's message handler pipeline. This leaked interceptor will exist for the lifetime of the StorageClient and will inspect every subsequent HTTP request, which can cause performance issues and unexpected side effects.

To fix this, the interceptor should only be added when a full upload is performed. A possible solution is to defer adding the interceptor until the upload actually starts. Since ResumableUpload.UploadAsync is not virtual, one way to achieve this is to add a new method to CustomMediaUpload to add the interceptor, and call it from UploadObjectRunner before starting the upload. This would require a change in StorageClientImpl.UploadObject.cs, but it's crucial to fix this leak.

Comment on lines +68 to +119
public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
if (UploadUri != null && !UploadUri.Equals(request.RequestUri))
{
return;
}

if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") is true)
{
string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();

if (IsFinalChunk(rangeHeader))
{
if (!_stream.CanSeek)
{
_owner.UploadSessionData -= OnSessionData;
_owner.ProgressChanged -= OnProgressChanged;
return;
}

long originalPosition = _stream.Position;
try
{
_stream.Position = 0;

var hasher = new Crc32c();
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
}
finally
{
System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
}

byte[] hash = hasher.GetHash();
string calculatedHash = Convert.ToBase64String(hash);
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
}
finally
{
_stream.Position = originalPosition;
_owner.UploadSessionData -= OnSessionData;
}
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The event handler cleanup logic for UploadSessionData and ProgressChanged is a bit complex and spread across multiple methods (InterceptAsync, OnProgressChanged, UnsubscribeFromSessionData), including some redundant unsubscriptions. This can be simplified to improve readability and maintainability.

Consider the following simplification:

  1. Unsubscribe from UploadSessionData immediately within the OnSessionData handler, since it's only needed once to get the UploadUri.
  2. Remove the UnsubscribeFromSessionData method and all calls to it.
  3. In the !_stream.CanSeek case within InterceptAsync, just return. The main cleanup in OnProgressChanged will still execute when the upload finishes.

This would centralize the cleanup logic and make the component's lifecycle easier to follow.

Comment on lines +162 to +165
if (!long.TryParse(totalSpan.ToString(), System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
{
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve performance, use the long.TryParse overload that accepts a ReadOnlySpan<char>. This avoids the string allocation from totalSpan.ToString().

                if (!long.TryParse(totalSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
                {
                    return false;
                }

Comment on lines +175 to +178
if (!long.TryParse(endByteSpan.ToString(), System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
{
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, using the long.TryParse overload that accepts a ReadOnlySpan<char> will avoid an unnecessary string allocation and improve performance.

                if (!long.TryParse(endByteSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
                {
                    return false;
                }

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables full object checksum validation for multi-chunk resumable uploads, which is a great enhancement for data integrity. The implementation uses an IHttpExecuteInterceptor to add the CRC32C hash to the final upload chunk, which is a clean approach. The integration tests have been updated to reflect the new server-side validation behavior, and a helper stream class has been made more robust. The changes are well-structured and correct. I have one minor suggestion to improve code style consistency.

private readonly Stream _stream;
private readonly IClientService _service;
private readonly CustomMediaUpload _owner;
private Uri UploadUri;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other private fields in this class (_stream, _service, _owner) and to adhere to common C# coding conventions, private fields should be prefixed with an underscore. Please rename UploadUri to _uploadUri and update its usages accordingly within the Crc32cHashInterceptor class (on lines 70 and 120).

            private Uri _uploadUri;


int toRead = (int) Math.Min(count, remaining);

Array.Clear(buffer, offset, toRead);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Array.Clear is correct for filling the buffer with zeros. However, for performance-critical code, buffer.AsSpan(offset, toRead).Clear() can be slightly more efficient as it may be optimized better by the JIT (e.g., using vectorized instructions). Given this is in a stream's Read method, it's worth considering.

                buffer.AsSpan(offset, toRead).Clear();

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

@mahendra-google mahendra-google force-pushed the enable-json-full-object-checksums branch from f443772 to d4e4802 Compare January 21, 2026 11:35
@mahendra-google mahendra-google force-pushed the enable-json-full-object-checksums branch from 8086e46 to a371f67 Compare March 31, 2026 07:53
fix(Storage): Check for disabled checksumming before wrapping stream
Mark UploadValidationException as [Obsolete] to signal the transition
from client-side to server-side upload validation.

Server-side validation now returns a GoogleApiException with a
400 (Bad Request) status code instead of throwing this custom exception.
This change maintains binary compatibility while providing a migration
path for existing catch blocks.
@mahendra-google mahendra-google force-pushed the enable-json-full-object-checksums branch from 99c6617 to 54d1d1c Compare April 24, 2026 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant