Skip to content

feat: Enable full object checksum validation for multi-chunk resumable uploads#1

Open
mahendra-google wants to merge 1 commit intomainfrom
add-trailing-checksums
Open

feat: Enable full object checksum validation for multi-chunk resumable uploads#1
mahendra-google wants to merge 1 commit intomainfrom
add-trailing-checksums

Conversation

@mahendra-google
Copy link
Copy Markdown
Owner

@mahendra-google mahendra-google commented Dec 12, 2025

This PR enables full object checksum validation (specifically CRC32C) for JSON-based resumable uploads, ensuring end-to-end data integrity for multi-chunk transfers. Please see b/461996128

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @mahendra-google, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances data integrity for resumable uploads within the Google API client library by introducing comprehensive client-side CRC32C validation. It ensures that the entire object's checksum is calculated and verified against any provided metadata, and also transmitted with the final upload segment, thereby guaranteeing the consistency and correctness of uploaded data.

Highlights

  • CRC32C Hashing Library: A new NuGet package, System.IO.Hashing version 10.0.1, has been added to the Google.Apis project to facilitate efficient CRC32C checksum calculations.
  • Client-Side CRC32C Calculation: The ResumableUpload class now includes an internal method CalculateCrc32c which computes the CRC32C hash of an entire stream, ensuring proper stream position restoration and Big-Endian byte order for Google APIs compatibility.
  • CRC32C Validation for Resumable Uploads: The calculated CRC32C checksum is now included in the X-Goog-Hash header for the final chunk of a resumable upload. Additionally, during upload initiation, the client-calculated CRC32C is compared against any CRC32C provided in the object metadata, throwing an error if a mismatch occurs.
  • New Property for Trailing CRC32C: A TrailingCrc32c property has been added to InitiatedResumableUpload to store the base64 encoded CRC32C checksum of the entire stream, which is sent with the final upload chunk.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for full object CRC32C validation in resumable uploads for the JSON API. The changes include adding a dependency on System.IO.Hashing, implementing CRC32C calculation for streams, and adding the hash to both the upload initiation request and the final data chunk. While the overall approach is correct, I've identified a few critical issues: the use of a preview version for the new dependency, an incorrect format for the hash header in the final chunk, and a logic error where the hash header is added unconditionally in the initiation request. My review includes suggestions to address these points.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment on lines +1211 to +1232
if (calculatedHash != null)
{
if (Body != null)
{
var bodyType = Body.GetType();
// Look for "Crc32c" property common in Google API data objects
var crcProperty = bodyType.GetProperty("Crc32c");

if (crcProperty != null)
{
var metadataHash = crcProperty.GetValue(Body) as string;

// "It will be an error to set it and also set the same checksums in the object metadata unless the values are same."
if (!string.IsNullOrEmpty(metadataHash) && metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
}
}
}
request.Headers.Add(GoogleHashHeader, $"crc32c={calculatedHash}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The X-Goog-Hash header is being added to the request even when calculatedHash is null (e.g., for non-seekable streams). This will result in a header like X-Goog-Hash: crc32c=, which is likely invalid. The logic to add the header should be moved inside the if (calculatedHash != null) block. The incorrect indentation may be hiding this logic error.

            if (calculatedHash != null)
            {
                if (Body != null)
                {
                    var bodyType = Body.GetType();
                    // Look for "Crc32c" property common in Google API data objects
                    var crcProperty = bodyType.GetProperty("Crc32c");

                    if (crcProperty != null)
                    {
                        var metadataHash = crcProperty.GetValue(Body) as string;

                        // "It will be an error to set it and also set the same checksums in the object metadata unless the values are same."
                        if (!string.IsNullOrEmpty(metadataHash) && metadataHash != calculatedHash)
                        {
                            throw new InvalidOperationException(
                                $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
                        }
                    }
                }
                request.Headers.Add(GoogleHashHeader, $"crc32c={calculatedHash}");
            }

Comment thread Src/Support/Google.Apis/Google.Apis.csproj Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads. While the intention is good, the implementation introduces the same critical bug in two different places: the X-Goog-Hash header is added to HTTP requests after they have already been sent. This makes the new validation feature non-functional. My review provides details on how to fix the placement of this logic to ensure the header is correctly included in the requests.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs
Comment on lines +1210 to +1238
string calculatedHash = CalculateCrc32c(ContentStream);

if (calculatedHash != null)
{
if (Body != null)
{
var bodyType = Body.GetType();
// Look for "Crc32c" property common in Google API data objects
var crcProperty = bodyType.GetProperty("Crc32c");

if (crcProperty != null)
{
var metadataHash = crcProperty.GetValue(Body) as string;

// "It will be an error to set it and also set the same checksums in the object metadata unless the values are same."
if (!string.IsNullOrEmpty(metadataHash) && metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
else
{
request.Headers.Add(GoogleHashHeader, $"crc32c={calculatedHash}");
}
}
}

}
return response.Headers.Location;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block of code has a few issues:

  1. Critical Bug: The logic to calculate the CRC32C hash and add the X-Goog-Hash header is executed after the session initiation request has been sent (on line 1204). This means the header will not be included in the initiation request. This entire block (lines 1210-1237) needs to be moved to before the Service.HttpClient.SendAsync call on line 1204.

  2. Logic Improvement: The header is only added if the Body is not null and has a Crc32c property. It would be more robust to add the header whenever a hash is calculated, as long as it doesn't conflict with existing metadata. If the Crc32c property doesn't exist on the body, the header can still be sent for validation by the server.

  3. Formatting: The return statement on line 1238 has incorrect indentation, which harms readability.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CRC32C validation for resumable uploads, a valuable feature for data integrity. However, I've identified two critical issues in the implementation. The first is a logic error in InitiateSessionAsync where hash calculation occurs after the request is sent, which would cause a runtime exception. The second is an incorrect byte order reversal in CalculateCrc32c that would lead to hash mismatches on most common hardware. I've also suggested adding more test coverage to help catch such issues in the future. Please see the detailed comments for suggestions on how to fix these problems.

Comment on lines +725 to +728
if (BitConverter.IsLittleEndian)
{
Array.Reverse(hashBytes);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The System.IO.Hashing.Crc32.GetCurrentHash() method already returns the hash in big-endian byte order. The conditional Array.Reverse is therefore incorrect on little-endian systems (like x86/x64), as it will reverse the already correct big-endian byte array into a little-endian one. This will cause hash validation to fail. This block should be removed.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment on lines +499 to +503
if (knownSize)
{
var request = server.Requests[server.Requests.Count - 1];
Assert.Contains("X-Goog-Hash", request.Headers.AllKeys);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is good for verifying the X-Goog-Hash header on the final chunk of an upload. However, the feature also adds this header to the initial request in InitiateSessionAsync. It would be beneficial to add a new test case to cover that scenario. A dedicated test for the session initiation would have likely caught the critical bug in InitiateSessionAsync where the header was being added after the request was sent.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads, which is a valuable feature for ensuring data integrity. The implementation is mostly solid, but I've identified a significant performance issue: the CRC32C hash is calculated synchronously, which can block the application when uploading large files. I've provided a set of comments to convert this calculation to be fully asynchronous. Additionally, I've suggested a small refactoring to improve code clarity in one of the new sections.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for CRC32C validation for resumable uploads in the JSON API. The changes include adding a dependency on System.IO.Hashing, implementing the CRC32C calculation logic, and integrating it into the resumable upload flow for both session initiation and final chunk upload. The implementation is solid, with good use of try...finally to manage stream resources and correct handling of endianness. I have a couple of suggestions for improvement regarding the use of a magic number and the use of reflection for property access, which could be made more robust and performant.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
string calculatedHash = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
if (calculatedHash != null && Body != null)
{
var crcProperty = Body.GetType().GetProperty("Crc32c");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using reflection with Body.GetType().GetProperty("Crc32c") to access the CRC32C hash from the metadata object is not type-safe and can have a performance impact. Consider defining an interface that metadata objects can implement, for example:

public interface IWithCrc32c
{
    string Crc32c { get; }
}

This would allow for a type-safe check:

if (Body is IWithCrc32c crcProvider)
{
    var metadataHash = crcProvider.Crc32c;
    // ... rest of the logic
}

While this might be a breaking change or not align with the library's design for auto-generated code, it's a worthwhile consideration for future improvements to enhance robustness and performance.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads, which is a great feature for ensuring data integrity. The implementation covers two scenarios: validating a user-provided hash at the start of an upload, and calculating and sending a hash with the final chunk.

My review focuses on improving the efficiency, correctness, and maintainability of the implementation. I've identified a critical performance issue in how the hash is calculated for the final chunk, which involves re-reading the entire file. I've suggested an alternative approach using the existing UploadStreamInterceptor to calculate the hash on-the-fly. I also found a couple of other issues related to redundant hash sending and code maintainability.

Please see my detailed comments for suggestions on how to address these points.

Comment on lines +665 to +673
if (isFinalChunk)
{
string crc32cHash = await CalculateCrc32cAsync(stream, cancellationToken).ConfigureAwait(false);
TrailingCrc32c = crc32cHash;
if (!string.IsNullOrEmpty(TrailingCrc32c))
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation for adding the CRC32C hash on the final chunk is inefficient and potentially incorrect. CalculateCrc32cAsync re-reads the entire stream from the beginning right before sending the final chunk. For large files, this introduces significant overhead (e.g., reading a 1GB file twice) and delays the completion of the upload.

Furthermore, if the stream content is modified by another part of the application during the upload, the calculated hash will not match the uploaded data, leading to incorrect validation.

A better approach is to calculate the hash on-the-fly as data is read for each chunk. The existing UploadStreamInterceptor mechanism is well-suited for this. This would also allow supporting CRC32C validation for non-seekable streams.

I recommend refactoring this to compute the hash incrementally. Here's a possible approach:

  1. Introduce a Crc32 hasher instance when the upload starts.
  2. Use UploadStreamInterceptor to append each chunk of data to the hasher as it's being prepared for upload.
  3. In SendNextChunkAsync, when it's the final chunk, get the final hash from the hasher instance and add it to the header.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist As per requireemnt The x-goog-hash header will be recognized in the JSON API’s StartResumableWrite call
It will be an error to set it and also set the same checksums in the object metadata of that call, unless the values agree.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the API requirements for the x-goog-hash header and its interaction with the StartResumableWrite call and object metadata.

My previous comment focused on the efficiency of calculating this CRC32C hash. The current implementation re-reads the entire stream in CalculateCrc32cAsync right before sending the final chunk, which can be very inefficient for large files and doesn't support non-seekable streams.

The incremental calculation approach I suggested (using UploadStreamInterceptor to update a hasher as data is read) would still ensure that the final, complete CRC32C hash is available. This hash could then be added to the x-goog-hash header on the final chunk (or during the initial InitiateSessionAsync call if the hash is known upfront and matches the object metadata), fully meeting the API requirements without the performance overhead of re-reading the entire stream.

Comment on lines +665 to +673
if (isFinalChunk)
{
string crc32cHash = await CalculateCrc32cAsync(stream, cancellationToken).ConfigureAwait(false);
TrailingCrc32c = crc32cHash;
if (!string.IsNullOrEmpty(TrailingCrc32c))
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic sends the CRC32C hash with the final chunk, even if it was already sent with the session initiation request. This is redundant. According to Google Cloud Storage documentation, the hash should be provided either at the start or at the end, but not both.

To fix this, you can introduce a flag (e.g., _crc32cSentOnInitiation) that is set in InitiateSessionAsync when the hash is added to the initiation request. Then, in SendNextChunkAsync, check this flag to avoid calculating and sending the hash again on the final chunk.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist As per requireemnt The x-goog-hash header will be recognized in the JSON API’s StartResumableWrite call
It will be an error to set it and also set the same checksums in the object metadata of that call, unless the values agree.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @mahendra-google. I understand that the x-goog-hash header must agree with the object metadata if both are set in the StartResumableWrite call. However, the Google Cloud Storage documentation generally advises providing the hash "either at the start or at the end, but not both" to avoid redundancy.

Currently, if the CRC32C hash is provided in the object metadata, it will be sent during the InitiateSessionAsync call. Then, in SendNextChunkAsync, if it's the final chunk, the hash is calculated again and also added to the request header. This results in sending the hash twice, which is redundant.

To align with the recommendation of sending the hash only once, I suggest introducing a private boolean field, for example, _crc32cHeaderSentInInitiation, in the ResumableUpload class. This flag should be set to true in InitiateSessionAsync if the X-Goog-Hash header is added there. Then, in SendNextChunkAsync, the header should only be added if this flag is false.

This ensures that the hash is sent only once, either at the initiation or with the final chunk, preventing redundant header transmission while still ensuring correctness.

Suggested change
if (isFinalChunk)
{
string crc32cHash = await CalculateCrc32cAsync(stream, cancellationToken).ConfigureAwait(false);
TrailingCrc32c = crc32cHash;
if (!string.IsNullOrEmpty(TrailingCrc32c))
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}");
}
}
if (!_crc32cHeaderSentInInitiation && !string.IsNullOrEmpty(TrailingCrc32c))
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}");
}

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CRC32C validation for resumable uploads, which is a valuable addition for data integrity. The implementation is well-structured, adding the necessary dependency and logic to calculate and attach the hash. I've identified a logical issue in the session initiation that prevents the hash from being sent in all possible cases, and a performance optimization for the hash calculation logic. Addressing these points will make the feature more robust and efficient.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated

// 3. Calculate CRC32C (Using System.IO.Hashing)
var crc32c = new System.IO.Hashing.Crc32();
byte[] buffer = new byte[Crc32cCalculationBufferSize];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve performance and reduce memory allocations, consider using System.Buffers.ArrayPool<byte>.Shared to rent and return the buffer used for reading the stream. This avoids allocating a large byte array on the heap for each hash calculation, which can reduce garbage collector pressure, especially in applications that perform many uploads.

You would replace byte[] buffer = new byte[Crc32cCalculationBufferSize]; with byte[] buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(Crc32cCalculationBufferSize); and then wrap the while loop in a try...finally block to call System.Buffers.ArrayPool<byte>.Shared.Return(buffer);.

This would require adding a package reference to System.Buffers in the .csproj file.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CRC32C validation for resumable uploads, which is an excellent addition for ensuring data integrity. The implementation is well-structured, particularly the logic for calculating the hash and handling it at both session initiation and on the final chunk. I've identified a potential bug related to non-seekable streams that could lead to a malformed header. Additionally, I've provided a few suggestions to enhance performance, clarity, and maintainability by leveraging modern .NET APIs and patterns, such as using ArrayPool for buffer management and a more robust method for handling byte order.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment on lines +735 to +741
byte[] hashBytes = crc32c.GetCurrentHash();

// 4. Ensure Big-Endian byte order (required by Google APIs)
if (BitConverter.IsLittleEndian)
{
Array.Reverse(hashBytes);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better clarity and to use more modern .NET APIs for handling endianness, I suggest using Crc32.GetCurrentHashAsUInt32() combined with System.Buffers.Binary.BinaryPrimitives.WriteUInt32BigEndian. This makes the intent of converting the hash to big-endian bytes more explicit and is generally more efficient than reversing the array.

uint hash = crc32c.GetCurrentHashAsUInt32();
byte[] hashBytes = new byte[4];
System.Buffers.Binary.BinaryPrimitives.WriteUInt32BigEndian(hashBytes, hash);

string calculatedHash = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
if (calculatedHash != null && Body != null)
{
var crcProperty = Body.GetType().GetProperty(Crc32cPropertyName);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using reflection with GetProperty and a hardcoded string for Crc32cPropertyName is not type-safe and can be brittle. A change in the property name in any request object would cause this check to fail silently.

For future improvement, consider a more robust, type-safe approach. You could define an interface, for example ICrc32cProvider, with a string Crc32c { get; } property. Then you could check if Body implements this interface (if (Body is ICrc32cProvider crcProvider)). This would be safer, more maintainable, and more performant than reflection. I understand this might be a breaking change, so it's something to consider for a future version.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements CRC32C checksum validation for resumable uploads, incorporating the System.IO.Hashing package to calculate the CRC32C of the upload stream. The changes introduce new constants for the X-Goog-Hash header and buffer size, and add a CalculateCrc32cAsync method. The ResumableUpload logic is updated to include the X-Goog-Hash header with the calculated CRC32C either during session initiation (if a matching Crc32c is present in the request body metadata) or with the final upload chunk. A new test case verifies the presence of this header in the final request. A review comment notes that the newly added TrailingCrc32c property in InitiatedResumableUpload is currently unused and publicly mutable, suggesting its setter should be made private or internal, or the property removed, to prevent external misuse.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for CRC32C validation for resumable uploads in the JSON API, which is a great feature for ensuring data integrity. The implementation correctly handles sending the hash either during session initiation (if provided in metadata and the stream is seekable) or with the final chunk. The use of System.IO.Hashing and ArrayPool is efficient.

I have two main points of feedback:

  1. There's a performance issue in InitiateSessionAsync where the CRC32C hash is calculated for every seekable stream, even when it's not needed. I've suggested a change to calculate it lazily.
  2. The new property Crc32cHeaderSentInInitiation has a public setter, which could be misused. I've recommended making its setter internal.

Additionally, the new logic in InitiateSessionAsync for handling user-provided CRC32C hashes is not covered by unit tests. It would be beneficial to add tests for scenarios with matching hashes, mismatching hashes (expecting an exception), and when no hash is provided in the metadata.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CRC32C checksum validation for resumable uploads, which is a valuable feature for ensuring data integrity. The implementation that handles validation during the session initiation by checking a user-provided hash is well-executed. However, I've identified a critical issue in the fallback mechanism that attempts to add the hash to the final upload chunk. This logic is based on an apparent misunderstanding of the Google Cloud Storage API, which will lead to upload failures for multi-chunk uploads. It also introduces a significant performance degradation for large files. I've provided a detailed explanation and a suggested fix for this critical issue. Additionally, I've included a minor suggestion to improve the robustness of property lookup. Addressing the critical issue is essential for the feature to work correctly and efficiently.

Comment on lines +661 to +680
bool isFinalChunk;
isFinalChunk = BytesClientSent == StreamLength;
// After creating the request content, and before sending it
if (isFinalChunk && !Crc32cHeaderSentInInitiation)
{
string trailingCrc32c = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
if (trailingCrc32c != null)
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={trailingCrc32c}");
}
}
else
{
// Explicitly ensure the header is NOT present for non-final chunks
// (Just in case it was added by a default modifier elsewhere)
if (request.Headers.Contains(GoogleHashHeader))
{
request.Headers.Remove(GoogleHashHeader);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This logic for adding the X-Goog-Hash header on the final chunk is incorrect and will cause uploads to fail.

According to the Google Cloud Storage documentation on hashes, when a hash is provided in the final request of a resumable upload, it is used to validate only the data in that final chunk, not the entire object. This implementation calculates the CRC32C hash of the entire stream (CalculateCrc32cAsync) and sends it with the final chunk. For any multi-chunk upload, the full object hash will not match the hash of just the final chunk, causing the server to reject the upload with a 400 Bad Request error.

Furthermore, CalculateCrc32cAsync re-reads the entire stream from the beginning. For large files, this introduces a severe performance penalty, as the entire file content is read from disk a second time just before the upload completes.

Full object validation should be performed by providing the hash in the initial request, which is already correctly implemented in InitiateSessionAsync. This fallback logic should be removed.

Removing this block will also require removing the test assertion in ResumableUploadTest.MultiChunk.cs at lines 499-503, which was added to test this incorrect behavior.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads, which is a great feature for ensuring data integrity. The implementation is mostly solid, but I've found a critical bug in the CRC32C hash calculation related to incorrect endianness handling that would cause validation to fail on most systems. I've also suggested an improvement to the new test to make it more robust by validating the hash value, which would have caught this bug. Additionally, there's a minor code style suggestion for better readability. Please address the critical issue before merging.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads, which is a great feature for ensuring data integrity. The implementation is well-done, using System.IO.Hashing for checksum calculation and correctly handling stream operations. The changes include adding logic to send the hash on the final chunk if not provided during initiation, and also validating a user-provided hash in the request body against the stream content. The tests have been updated to cover some of the new functionality. I have a couple of suggestions: one to improve performance by caching reflection data, and another to add a test case for a new validation scenario.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads, which is a valuable feature for data integrity. The implementation is solid, covering scenarios where the hash is provided at the start of the upload or with the final chunk. The code is robust, using ArrayPool for efficient buffer management and correctly handling stream positions. My main feedback is to enhance test coverage by adding a test case for the scenario where the hash is provided in the request metadata during upload initiation, to ensure all aspects of this new feature are verified and to prevent future regressions.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CRC32C validation for resumable uploads, enhancing data integrity. The implementation correctly handles validation at session initiation and on the final chunk. The code is well-structured, and the tests cover the new functionality effectively. I have a few suggestions to improve maintainability and clean up some test code.

Comment thread Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.MultiChunk.cs Outdated
Comment thread Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.Utilities.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds CRC32C validation for resumable uploads, which is an excellent feature for ensuring data integrity. The implementation correctly handles two main scenarios: providing the hash at session initiation and as a trailing header with the final chunk. The accompanying tests are thorough, covering both success and failure cases. My review includes a few suggestions to improve code style and readability.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces CRC32C validation for resumable uploads, which is a great addition for ensuring data integrity. The implementation correctly handles validation at session initiation and adds a trailing hash for validation on completion. The new tests are also well-written and cover important scenarios.

My review includes a few suggestions. The most critical one is a performance concern regarding how the CRC32C hash is calculated for uploads without pre-supplied metadata, which currently involves re-reading the entire file at the end of the upload. I've also included a couple of minor suggestions to improve code maintainability and consistency.

Overall, this is a solid contribution. Addressing the performance issue will make this feature much more robust for large file uploads.

Comment on lines +664 to +671
if (isFinalChunk && !Crc32cHeaderSentInInitiation)
{
string trailingCrc32c = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
if (trailingCrc32c != null)
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={trailingCrc32c}");
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a significant performance issue here. Calling CalculateCrc32cAsync just before sending the final chunk causes the entire ContentStream to be read from the beginning again. For large files, this is very inefficient as the file content is read twice: once for uploading chunk by chunk, and a second time in full at the very end.

A more performant approach for seekable streams would be to calculate the hash once at the beginning of the upload process and store it for later use. For example, you could compute the hash in UploadCoreAsync before the while loop starts and store it in a private field. This would avoid re-reading the entire stream at the end of a potentially long upload.

Example of what this could look like:

// In ResumableUpload class
private string _fullStreamCrc32c;

private async Task<IUploadProgress> UploadCoreAsync(CancellationToken cancellationToken)
{
    // Calculate hash at the start of the upload if stream is seekable.
    _fullStreamCrc32c = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
    try
    {
        while (!await SendNextChunkAsync(ContentStream, cancellationToken).ConfigureAwait(false))
        {
            // ...
        }
        // ...
    }
    // ...
}

// In SendNextChunkAsync
if (isFinalChunk && !Crc32cHeaderSentInInitiation)
{
    if (_fullStreamCrc32c != null)
    {
        request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={_fullStreamCrc32c}");
    }
}

This would move the performance cost of the extra read to the beginning of the upload, which is a much better user experience than a long pause before the upload completes.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for CRC32C validation for resumable uploads. The changes include client-side validation of a provided CRC32C hash before starting an upload, and sending the hash to the server for validation, either at initiation or with the final chunk. My review has identified a couple of areas for improvement. A significant performance issue is that the upload stream is read twice for seekable streams: once to calculate the hash and once to upload the data, which is inefficient for large files. I've suggested an alternative approach to calculate the hash on-the-fly. Additionally, I've noted a design consideration regarding the use of reflection to retrieve the CRC32C hash from the request body, proposing an interface-based alternative for better type safety and performance. The new tests are well-written and cover the new functionality, including success and failure scenarios.

var metadataHash = GetPropertyFromGenericBody(Body, Crc32c);
if (!string.IsNullOrEmpty(metadataHash))
{
string calculatedHash = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation has a significant performance implication for large files. CalculateCrc32cAsync reads the entire stream to compute the hash. The stream is then read a second time during the actual upload process in UploadCoreAsync. For large files, this double-read can be very inefficient, consuming extra time and I/O resources.

A more efficient approach would be to calculate the CRC32C hash on-the-fly as the stream is read for uploading. This could be achieved by wrapping the ContentStream in a custom Stream implementation that computes the hash as data is read from it. This would avoid reading the stream twice.

While the current approach is simpler to implement, the performance cost is notable and this behavior should be clearly documented for users of this feature.

Comment on lines +1317 to +1318
internal string GetPropertyFromGenericBody(TRequest body, string propertyName) =>
body?.GetType().GetProperty(propertyName)?.GetValue(body) as string;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using reflection to get the Crc32c property is flexible but has some drawbacks:

  • It's not type-safe. A typo in the property name (Crc32c) would only be discovered at runtime.
  • It has a performance overhead compared to direct property access.

Consider defining an interface for bodies that support CRC32C validation:

public interface IBodyWithCrc32c
{
    string Crc32c { get; }
}

Then, in InitiateSessionAsync, you could check if Body implements this interface:

if (Body is IBodyWithCrc32c bodyWithCrc32c)
{
    var metadataHash = bodyWithCrc32c.Crc32c;
    // ... rest of the logic
}

This approach provides compile-time safety and is more performant. While this would require users to implement the interface, it makes the contract for this feature explicit. Given that reflection is used elsewhere in this library, the current approach may be consistent with existing design patterns, but an interface-based approach is generally preferable for new features.

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for full object CRC32C validation for resumable uploads in the JSON API. The implementation correctly handles two scenarios: providing the checksum in the initial request if available in the object metadata, and sending it as a trailing header in the final upload chunk for seekable streams. The code is well-structured, with efficient use of System.Buffers.ArrayPool for stream processing. The new functionality is also well-covered by unit tests. I have one suggestion regarding the use of reflection, which could be replaced by an interface-based approach for better type safety and performance.

Comment on lines +1316 to +1317
internal string GetPropertyFromGenericBody(TRequest body, string propertyName) =>
body?.GetType().GetProperty(propertyName)?.GetValue(body) as string;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using reflection to access the Crc32c property is a pragmatic way to avoid breaking changes. However, this approach is not type-safe and is less performant than direct property access. A typo in the property name would only be caught at runtime.

A more robust alternative would be to introduce an interface, for example IUploadWithCrc32c, that exposes the Crc32c property. You could then check if the Body object implements this interface.

Example:

public interface IUploadWithCrc32c
{
    string Crc32c { get; }
}

// In InitiateSessionAsync:
if (Body is IUploadWithCrc32c uploadWithCrc && !string.IsNullOrEmpty(uploadWithCrc.Crc32c))
{
    var metadataHash = uploadWithCrc.Crc32c;
    // ...
}

This would provide compile-time safety and better performance. It's a non-breaking change, and types that need this functionality can opt-in by implementing the interface.

@mahendra-google mahendra-google changed the title feat: Add full object CRC32C validation to JSON API for Resumable uploads feat: Enable full object checksum validation for multi-chunk resumable uploads Dec 18, 2025
@mahendra-google mahendra-google force-pushed the add-trailing-checksums branch 2 times, most recently from faf65af to f65e4ea Compare December 18, 2025 14:30
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces full object checksum validation (CRC32C) for JSON-based resumable uploads, which is a great enhancement for data integrity. The implementation is robust, handling various scenarios such as providing the checksum during session initiation or with the final chunk. It correctly distinguishes between seekable and non-seekable streams for client-side hash calculation. The new functionality is well-covered by unit tests. I've added one minor suggestion to improve the consistency and robustness of the tests.

Comment thread Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.MultiChunk.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for ensuring data integrity through CRC32C checksum validation in resumable uploads. The implementation is well-structured and includes thorough tests for various scenarios. However, there is a critical issue regarding target framework compatibility. The new dependency on System.IO.Hashing is only compatible with .NET 6.0 and later, which will cause build failures for the netstandard2.0 and net462 targets. My review provides specific suggestions to resolve this by using conditional compilation for the new code and the package reference. These changes will enable the feature on supported frameworks while maintaining compatibility with older ones.

Comment on lines +504 to +591
/// <summary>
/// Verifies that a resumable upload completes successfully and validates crc32c checksum across different chunking configurations
/// and stream types of known size.
/// </summary>
[Theory, CombinatorialData]
public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer(
[CombinatorialValues(true)] bool knownSize,
[CombinatorialValues(80, 150)] int partialSize,
[CombinatorialValues(100, 200)] int chunkSize)
{
var actualChunkSize = Math.Min(partialSize, chunkSize);
var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1;
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var content = knownSize ? new MemoryStream(UploadTestBytes) : new UnknownSizeMemoryStream(UploadTestBytes);
var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", actualChunkSize);
var progress = uploader.Upload();
Assert.Equal(expectedCallCount, server.Requests.Count);
var request = server.Requests.Last();
Assert.Contains(ResumableUpload.GoogleHashHeader, request.Headers.AllKeys);
var headerValue = request.Headers[ResumableUpload.GoogleHashHeader];
var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
Assert.Equal($"crc32c={expectedBase64Hash}", headerValue);
Assert.Equal(UploadTestBytes, server.Bytes);
Assert.Equal(UploadStatus.Completed, progress.Status);
}
}

/// <summary>
/// Verifies that the session initiation succeeds when a matching CRC32C checksum is provided.
/// </summary>
[Fact]
public async Task TestInitiateSessionWithMatchingCrc32c()
{
int partialSize = 80;
var content = new MemoryStream(UploadTestBytes);
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var chunkSize = 1024;
var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
var body = new TestResumableUploadWithCrc32c { Crc32c = expectedBase64Hash };
var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize)
{
Body = body
};

await uploader.InitiateSessionAsync();
// Assert that the initiation request was sent
Assert.Single(server.Requests);
var initiationRequest = server.Requests.First();

// Assert that the X-Goog-Hash header was sent with the correct CRC32C
Assert.Contains(ResumableUpload.GoogleHashHeader, initiationRequest.Headers.AllKeys);
Assert.Equal($"crc32c={expectedBase64Hash}", initiationRequest.Headers[ResumableUpload.GoogleHashHeader]);
Assert.True(uploader.Crc32cHeaderSentInInitiation);
}
}

/// <summary>
/// Verifies that the session initiation fails when a mismatched CRC32C checksum is provided.
/// </summary>
[Fact]
public async Task TestInitiateSessionWithMismatchingCrc32c()
{
int partialSize = 80;
var content = new MemoryStream(UploadTestBytes);
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var chunkSize = 1024;
var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
var body = new TestResumableUploadWithCrc32c { Crc32c = "incorrectHash" };

var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize)
{
Body = body
};

var exception = await Assert.ThrowsAsync<InvalidOperationException>(() => uploader.InitiateSessionAsync());
Assert.Equal($"The calculated CRC32C of the stream ({expectedBase64Hash}) does not match the CRC32C provided in the object metadata ({body.Crc32c}).", exception.Message);

// Assert that no request was sent to the server
Assert.Empty(server.Requests);
Assert.False(uploader.Crc32cHeaderSentInInitiation);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

These tests for CRC32C checksum validation rely on System.IO.Hashing, which is only available on .NET 6.0 and newer. To prevent build failures on older target frameworks, this entire block of tests should be conditionally compiled.

#if NET6_0_OR_GREATER
        /// <summary>
        /// Verifies that a resumable upload completes successfully and validates crc32c checksum across different chunking configurations 
        /// and stream types of known size.
        /// </summary>
        [Theory, CombinatorialData]
        public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer(
            [CombinatorialValues(true)] bool knownSize,
            [CombinatorialValues(80, 150)] int partialSize,
            [CombinatorialValues(100, 200)] int chunkSize)
        {
            var actualChunkSize = Math.Min(partialSize, chunkSize);
            var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1;
            using (var server = new MultiChunkPartialServer(_server, partialSize))
            using (var service = new MockClientService(server.HttpPrefix))
            {
                var content = knownSize ? new MemoryStream(UploadTestBytes) : new UnknownSizeMemoryStream(UploadTestBytes);
                var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", actualChunkSize);
                var progress = uploader.Upload();
                Assert.Equal(expectedCallCount, server.Requests.Count);
                var request = server.Requests.Last();
                Assert.Contains(ResumableUpload.GoogleHashHeader, request.Headers.AllKeys);
                var headerValue = request.Headers[ResumableUpload.GoogleHashHeader];
                var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
                Assert.Equal($"crc32c={expectedBase64Hash}", headerValue);
                Assert.Equal(UploadTestBytes, server.Bytes);
                Assert.Equal(UploadStatus.Completed, progress.Status);
            }
        }

        /// <summary>
        /// Verifies that the session initiation succeeds when a matching CRC32C checksum is provided.
        /// </summary>
        [Fact]
        public async Task TestInitiateSessionWithMatchingCrc32c()
        {
            int partialSize = 80;
            var content = new MemoryStream(UploadTestBytes);
            using (var server = new MultiChunkPartialServer(_server, partialSize))
            using (var service = new MockClientService(server.HttpPrefix))
            {
                var chunkSize = 1024;
                var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
                var body = new TestResumableUploadWithCrc32c { Crc32c = expectedBase64Hash };
                var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize)
                {
                    Body = body
                };

                await uploader.InitiateSessionAsync();
                // Assert that the initiation request was sent
                Assert.Single(server.Requests);
                var initiationRequest = server.Requests.First();

                // Assert that the X-Goog-Hash header was sent with the correct CRC32C
                Assert.Contains(ResumableUpload.GoogleHashHeader, initiationRequest.Headers.AllKeys);
                Assert.Equal($"crc32c={expectedBase64Hash}", initiationRequest.Headers[ResumableUpload.GoogleHashHeader]);
                Assert.True(uploader.Crc32cHeaderSentInInitiation);
            }
        }

        /// <summary>
        /// Verifies that the session initiation fails when a mismatched CRC32C checksum is provided.
        /// </summary>
        [Fact]
        public async Task TestInitiateSessionWithMismatchingCrc32c()
        {
            int partialSize = 80;
            var content = new MemoryStream(UploadTestBytes);
            using (var server = new MultiChunkPartialServer(_server, partialSize))
            using (var service = new MockClientService(server.HttpPrefix))
            {
                var chunkSize = 1024;
                var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
                var body = new TestResumableUploadWithCrc32c { Crc32c = "incorrectHash" };

                var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize)
                {
                    Body = body
                };

                var exception = await Assert.ThrowsAsync<InvalidOperationException>(() => uploader.InitiateSessionAsync());
                Assert.Equal($"The calculated CRC32C of the stream ({expectedBase64Hash}) does not match the CRC32C provided in the object metadata ({body.Crc32c}).", exception.Message);

                // Assert that no request was sent to the server
                Assert.Empty(server.Requests);
                Assert.False(uploader.Crc32cHeaderSentInInitiation);
            }
        }
#endif

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional compilation directives have been correctly applied to the CRC32C checksum validation tests and related code, addressing the compatibility issue with older target frameworks. This resolves the critical issue previously identified. Thank you for the quick resolution!

Comment on lines +96 to +102
/// <summary>
/// Resumable upload class that allows Crc32c to be set for testing.
/// </summary>
private class TestResumableUploadWithCrc32c
{
public string Crc32c { get; set; }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This test helper class is only used for CRC32C checksum validation tests, which are only available on .NET 6.0 and newer. To prevent build failures on older target frameworks, this class should be conditionally compiled.

#if NET6_0_OR_GREATER
        /// <summary>
        /// Resumable upload class that allows Crc32c to be set for testing.
        /// </summary>
        private class TestResumableUploadWithCrc32c
        {
            public string Crc32c { get; set; }
        }
#endif

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the /gemini review command! I've re-evaluated the changes.

It appears the TestResumableUploadWithCrc32c class in Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.Utilities.cs still needs to be conditionally compiled for .NET 6.0 or greater. This is crucial to prevent build failures on older target frameworks, as the CRC32C checksum validation features it supports are only available in .NET 6.0 and newer.

Please apply the conditional compilation as suggested previously.

Suggested change
/// <summary>
/// Resumable upload class that allows Crc32c to be set for testing.
/// </summary>
private class TestResumableUploadWithCrc32c
{
public string Crc32c { get; set; }
}
#if NET6_0_OR_GREATER
/// <summary>
/// Resumable upload class that allows Crc32c to be set for testing.
/// </summary>
private class TestResumableUploadWithCrc32c
{
public string Crc32c { get; set; }
}
#endif

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the /gemini review command! I've re-evaluated the changes.

The TestResumableUploadWithCrc32c class in Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.Utilities.cs still needs to be conditionally compiled for .NET 6.0 or greater. This is essential to prevent build failures on older target frameworks, as the CRC32C checksum validation features it supports are only available in .NET 6.0 and newer.

Please apply the conditional compilation as suggested previously.

Suggested change
/// <summary>
/// Resumable upload class that allows Crc32c to be set for testing.
/// </summary>
private class TestResumableUploadWithCrc32c
{
public string Crc32c { get; set; }
}
#if NET6_0_OR_GREATER
/// <summary>
/// Resumable upload class that allows Crc32c to be set for testing.
/// </summary>
private class TestResumableUploadWithCrc32c
{
public string Crc32c { get; set; }
}
#endif

Comment thread Src/Support/Google.Apis/Google.Apis.csproj Outdated
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs
Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively enables full object CRC32C checksum validation for JSON-based resumable uploads, which is a great enhancement for data integrity. The implementation is well-structured, conditionally compiled for .NET 6+, and includes a good set of tests to verify the new functionality. The core logic correctly handles checksum validation for seekable streams, both at session initiation and on the final upload chunk. I have one suggestion to improve the developer experience by adding a warning log when a checksum is provided for a non-seekable stream, as it's currently silently ignored.

Comment on lines +1232 to +1241
if (calculatedHash != null)
{
if (metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
Crc32cHeaderSentInInitiation = true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When a CRC32C hash is provided for a non-seekable stream, it is silently ignored. This could be confusing for developers who expect the hash to be used. It would be better to log a warning to inform them that the hash cannot be validated on the client and will not be sent to the server because the stream is not seekable.

                    if (calculatedHash != null)
                    {
                        if (metadataHash != calculatedHash)
                        {
                            throw new InvalidOperationException(
                                $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
                        }
                        request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
                        Crc32cHeaderSentInInitiation = true;
                    }
                    else
                    {
                        Logger.Warning("A CRC32C hash was provided in the request body, but the stream is not seekable. The hash will not be validated on the client, and will not be sent to the server.");
                    }

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for incorporating the warning! That addresses the core concern.

For better integration with potential logging frameworks used by consumers of this library, it would be more idiomatic to use Logger.Warning instead of Console.WriteLine. This allows the application to configure how these warnings are handled (e.g., routed to a log file, a specific logging service, or suppressed).

Suggested change
if (calculatedHash != null)
{
if (metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
Crc32cHeaderSentInInitiation = true;
}
else
{
Logger.Warning("A CRC32C hash was provided in the request body, but the stream is not seekable. The hash will not be validated on the client, and will not be sent to the server.");
}
Suggested change
if (calculatedHash != null)
{
if (metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
Crc32cHeaderSentInInitiation = true;
}
else
{
Logger.Warning("A CRC32C hash was provided in the request body, but the stream is not seekable. The hash will not be validated on the client, and will not be sent to the server.");
}

@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces full object checksum validation (CRC32C) for multi-chunk resumable uploads, which is a great enhancement for data integrity. The implementation is well-structured, using conditional compilation for .NET 6+ features and correctly handling seekable and non-seekable streams. The logic to include the checksum either in the initiation request or with the final chunk is sound. The accompanying tests are thorough and cover various scenarios. I have one minor suggestion to improve logging practices.

Comment thread Src/Support/Google.Apis/Upload/ResumableUpload.cs Outdated
@mahendra-google
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces full object CRC32C checksum validation for multi-chunk resumable uploads, which is a great enhancement for data integrity. The implementation is well-structured, using conditional compilation for .NET 6+ to leverage System.IO.Hashing. The logic correctly handles checksum validation during session initiation for seekable streams and as a trailing header for the final chunk. The accompanying tests are thorough, covering various scenarios including matching/mismatching checksums and different chunking configurations. I have a couple of minor suggestions to improve code clarity and correctness in comments.

Comment on lines +509 to +519
public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer(
[CombinatorialValues(true)] bool knownSize,
[CombinatorialValues(80, 150)] int partialSize,
[CombinatorialValues(100, 200)] int chunkSize)
{
var actualChunkSize = Math.Min(partialSize, chunkSize);
var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1;
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var content = knownSize ? new MemoryStream(UploadTestBytes) : new UnknownSizeMemoryStream(UploadTestBytes);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The knownSize parameter is only ever true in this test, making it and the conditional logic for creating the content stream redundant. To improve clarity and simplify the test, you can remove the knownSize parameter and directly initialize content as a MemoryStream.

        public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer(
            [CombinatorialValues(80, 150)] int partialSize,
            [CombinatorialValues(100, 200)] int chunkSize)
        {
            var actualChunkSize = Math.Min(partialSize, chunkSize);
            var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1;
            using (var server = new MultiChunkPartialServer(_server, partialSize))
            using (var service = new MockClientService(server.HttpPrefix))
            {
                var content = new MemoryStream(UploadTestBytes);

Comment on lines +83 to +85
/// The size of the buffer (in bytes) used when reading the stream to calculate the CRC32C checksum.
/// Set to 80 KB (81920 bytes) to balance memory usage and I/O performance.
/// </summary>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a minor inaccuracy in the comment. 81920 bytes is 80 KiB (kibibytes), not 80 KB (kilobytes). Using "KiB" is more precise for base-2 units.

        /// The size of the buffer (in bytes) used when reading the stream to calculate the CRC32C checksum.
        /// Set to 80 KiB (81920 bytes) to balance memory usage and I/O performance.
        /// </summary>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant