feat: Enable full object checksum validation for multi-chunk resumable uploads#1
feat: Enable full object checksum validation for multi-chunk resumable uploads#1mahendra-google wants to merge 1 commit intomainfrom
Conversation
Summary of ChangesHello @mahendra-google, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances data integrity for resumable uploads within the Google API client library by introducing comprehensive client-side CRC32C validation. It ensures that the entire object's checksum is calculated and verified against any provided metadata, and also transmitted with the final upload segment, thereby guaranteeing the consistency and correctness of uploaded data. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for full object CRC32C validation in resumable uploads for the JSON API. The changes include adding a dependency on System.IO.Hashing, implementing CRC32C calculation for streams, and adding the hash to both the upload initiation request and the final data chunk. While the overall approach is correct, I've identified a few critical issues: the use of a preview version for the new dependency, an incorrect format for the hash header in the final chunk, and a logic error where the hash header is added unconditionally in the initiation request. My review includes suggestions to address these points.
| if (calculatedHash != null) | ||
| { | ||
| if (Body != null) | ||
| { | ||
| var bodyType = Body.GetType(); | ||
| // Look for "Crc32c" property common in Google API data objects | ||
| var crcProperty = bodyType.GetProperty("Crc32c"); | ||
|
|
||
| if (crcProperty != null) | ||
| { | ||
| var metadataHash = crcProperty.GetValue(Body) as string; | ||
|
|
||
| // "It will be an error to set it and also set the same checksums in the object metadata unless the values are same." | ||
| if (!string.IsNullOrEmpty(metadataHash) && metadataHash != calculatedHash) | ||
| { | ||
| throw new InvalidOperationException( | ||
| $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash})."); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| request.Headers.Add(GoogleHashHeader, $"crc32c={calculatedHash}"); |
There was a problem hiding this comment.
The X-Goog-Hash header is being added to the request even when calculatedHash is null (e.g., for non-seekable streams). This will result in a header like X-Goog-Hash: crc32c=, which is likely invalid. The logic to add the header should be moved inside the if (calculatedHash != null) block. The incorrect indentation may be hiding this logic error.
if (calculatedHash != null)
{
if (Body != null)
{
var bodyType = Body.GetType();
// Look for "Crc32c" property common in Google API data objects
var crcProperty = bodyType.GetProperty("Crc32c");
if (crcProperty != null)
{
var metadataHash = crcProperty.GetValue(Body) as string;
// "It will be an error to set it and also set the same checksums in the object metadata unless the values are same."
if (!string.IsNullOrEmpty(metadataHash) && metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
}
}
request.Headers.Add(GoogleHashHeader, $"crc32c={calculatedHash}");
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads. While the intention is good, the implementation introduces the same critical bug in two different places: the X-Goog-Hash header is added to HTTP requests after they have already been sent. This makes the new validation feature non-functional. My review provides details on how to fix the placement of this logic to ensure the header is correctly included in the requests.
| string calculatedHash = CalculateCrc32c(ContentStream); | ||
|
|
||
| if (calculatedHash != null) | ||
| { | ||
| if (Body != null) | ||
| { | ||
| var bodyType = Body.GetType(); | ||
| // Look for "Crc32c" property common in Google API data objects | ||
| var crcProperty = bodyType.GetProperty("Crc32c"); | ||
|
|
||
| if (crcProperty != null) | ||
| { | ||
| var metadataHash = crcProperty.GetValue(Body) as string; | ||
|
|
||
| // "It will be an error to set it and also set the same checksums in the object metadata unless the values are same." | ||
| if (!string.IsNullOrEmpty(metadataHash) && metadataHash != calculatedHash) | ||
| { | ||
| throw new InvalidOperationException( | ||
| $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash})."); | ||
| } | ||
| else | ||
| { | ||
| request.Headers.Add(GoogleHashHeader, $"crc32c={calculatedHash}"); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| } | ||
| return response.Headers.Location; |
There was a problem hiding this comment.
This block of code has a few issues:
-
Critical Bug: The logic to calculate the CRC32C hash and add the
X-Goog-Hashheader is executed after the session initiation request has been sent (on line 1204). This means the header will not be included in the initiation request. This entire block (lines 1210-1237) needs to be moved to before theService.HttpClient.SendAsynccall on line 1204. -
Logic Improvement: The header is only added if the
Bodyis not null and has aCrc32cproperty. It would be more robust to add the header whenever a hash is calculated, as long as it doesn't conflict with existing metadata. If theCrc32cproperty doesn't exist on the body, the header can still be sent for validation by the server. -
Formatting: The
returnstatement on line 1238 has incorrect indentation, which harms readability.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces CRC32C validation for resumable uploads, a valuable feature for data integrity. However, I've identified two critical issues in the implementation. The first is a logic error in InitiateSessionAsync where hash calculation occurs after the request is sent, which would cause a runtime exception. The second is an incorrect byte order reversal in CalculateCrc32c that would lead to hash mismatches on most common hardware. I've also suggested adding more test coverage to help catch such issues in the future. Please see the detailed comments for suggestions on how to fix these problems.
| if (BitConverter.IsLittleEndian) | ||
| { | ||
| Array.Reverse(hashBytes); | ||
| } |
There was a problem hiding this comment.
The System.IO.Hashing.Crc32.GetCurrentHash() method already returns the hash in big-endian byte order. The conditional Array.Reverse is therefore incorrect on little-endian systems (like x86/x64), as it will reverse the already correct big-endian byte array into a little-endian one. This will cause hash validation to fail. This block should be removed.
| if (knownSize) | ||
| { | ||
| var request = server.Requests[server.Requests.Count - 1]; | ||
| Assert.Contains("X-Goog-Hash", request.Headers.AllKeys); | ||
| } |
There was a problem hiding this comment.
This test is good for verifying the X-Goog-Hash header on the final chunk of an upload. However, the feature also adds this header to the initial request in InitiateSessionAsync. It would be beneficial to add a new test case to cover that scenario. A dedicated test for the session initiation would have likely caught the critical bug in InitiateSessionAsync where the header was being added after the request was sent.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads, which is a valuable feature for ensuring data integrity. The implementation is mostly solid, but I've identified a significant performance issue: the CRC32C hash is calculated synchronously, which can block the application when uploading large files. I've provided a set of comments to convert this calculation to be fully asynchronous. Additionally, I've suggested a small refactoring to improve code clarity in one of the new sections.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for CRC32C validation for resumable uploads in the JSON API. The changes include adding a dependency on System.IO.Hashing, implementing the CRC32C calculation logic, and integrating it into the resumable upload flow for both session initiation and final chunk upload. The implementation is solid, with good use of try...finally to manage stream resources and correct handling of endianness. I have a couple of suggestions for improvement regarding the use of a magic number and the use of reflection for property access, which could be made more robust and performant.
| string calculatedHash = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false); | ||
| if (calculatedHash != null && Body != null) | ||
| { | ||
| var crcProperty = Body.GetType().GetProperty("Crc32c"); |
There was a problem hiding this comment.
Using reflection with Body.GetType().GetProperty("Crc32c") to access the CRC32C hash from the metadata object is not type-safe and can have a performance impact. Consider defining an interface that metadata objects can implement, for example:
public interface IWithCrc32c
{
string Crc32c { get; }
}This would allow for a type-safe check:
if (Body is IWithCrc32c crcProvider)
{
var metadataHash = crcProvider.Crc32c;
// ... rest of the logic
}While this might be a breaking change or not align with the library's design for auto-generated code, it's a worthwhile consideration for future improvements to enhance robustness and performance.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads, which is a great feature for ensuring data integrity. The implementation covers two scenarios: validating a user-provided hash at the start of an upload, and calculating and sending a hash with the final chunk.
My review focuses on improving the efficiency, correctness, and maintainability of the implementation. I've identified a critical performance issue in how the hash is calculated for the final chunk, which involves re-reading the entire file. I've suggested an alternative approach using the existing UploadStreamInterceptor to calculate the hash on-the-fly. I also found a couple of other issues related to redundant hash sending and code maintainability.
Please see my detailed comments for suggestions on how to address these points.
| if (isFinalChunk) | ||
| { | ||
| string crc32cHash = await CalculateCrc32cAsync(stream, cancellationToken).ConfigureAwait(false); | ||
| TrailingCrc32c = crc32cHash; | ||
| if (!string.IsNullOrEmpty(TrailingCrc32c)) | ||
| { | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}"); | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation for adding the CRC32C hash on the final chunk is inefficient and potentially incorrect. CalculateCrc32cAsync re-reads the entire stream from the beginning right before sending the final chunk. For large files, this introduces significant overhead (e.g., reading a 1GB file twice) and delays the completion of the upload.
Furthermore, if the stream content is modified by another part of the application during the upload, the calculated hash will not match the uploaded data, leading to incorrect validation.
A better approach is to calculate the hash on-the-fly as data is read for each chunk. The existing UploadStreamInterceptor mechanism is well-suited for this. This would also allow supporting CRC32C validation for non-seekable streams.
I recommend refactoring this to compute the hash incrementally. Here's a possible approach:
- Introduce a
Crc32hasher instance when the upload starts. - Use
UploadStreamInterceptorto append each chunk of data to the hasher as it's being prepared for upload. - In
SendNextChunkAsync, when it's the final chunk, get the final hash from the hasher instance and add it to the header.
There was a problem hiding this comment.
@gemini-code-assist As per requireemnt The x-goog-hash header will be recognized in the JSON API’s StartResumableWrite call
It will be an error to set it and also set the same checksums in the object metadata of that call, unless the values agree.
There was a problem hiding this comment.
Thanks for clarifying the API requirements for the x-goog-hash header and its interaction with the StartResumableWrite call and object metadata.
My previous comment focused on the efficiency of calculating this CRC32C hash. The current implementation re-reads the entire stream in CalculateCrc32cAsync right before sending the final chunk, which can be very inefficient for large files and doesn't support non-seekable streams.
The incremental calculation approach I suggested (using UploadStreamInterceptor to update a hasher as data is read) would still ensure that the final, complete CRC32C hash is available. This hash could then be added to the x-goog-hash header on the final chunk (or during the initial InitiateSessionAsync call if the hash is known upfront and matches the object metadata), fully meeting the API requirements without the performance overhead of re-reading the entire stream.
| if (isFinalChunk) | ||
| { | ||
| string crc32cHash = await CalculateCrc32cAsync(stream, cancellationToken).ConfigureAwait(false); | ||
| TrailingCrc32c = crc32cHash; | ||
| if (!string.IsNullOrEmpty(TrailingCrc32c)) | ||
| { | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}"); | ||
| } | ||
| } |
There was a problem hiding this comment.
The current logic sends the CRC32C hash with the final chunk, even if it was already sent with the session initiation request. This is redundant. According to Google Cloud Storage documentation, the hash should be provided either at the start or at the end, but not both.
To fix this, you can introduce a flag (e.g., _crc32cSentOnInitiation) that is set in InitiateSessionAsync when the hash is added to the initiation request. Then, in SendNextChunkAsync, check this flag to avoid calculating and sending the hash again on the final chunk.
There was a problem hiding this comment.
@gemini-code-assist As per requireemnt The x-goog-hash header will be recognized in the JSON API’s StartResumableWrite call
It will be an error to set it and also set the same checksums in the object metadata of that call, unless the values agree.
There was a problem hiding this comment.
Thanks for the clarification, @mahendra-google. I understand that the x-goog-hash header must agree with the object metadata if both are set in the StartResumableWrite call. However, the Google Cloud Storage documentation generally advises providing the hash "either at the start or at the end, but not both" to avoid redundancy.
Currently, if the CRC32C hash is provided in the object metadata, it will be sent during the InitiateSessionAsync call. Then, in SendNextChunkAsync, if it's the final chunk, the hash is calculated again and also added to the request header. This results in sending the hash twice, which is redundant.
To align with the recommendation of sending the hash only once, I suggest introducing a private boolean field, for example, _crc32cHeaderSentInInitiation, in the ResumableUpload class. This flag should be set to true in InitiateSessionAsync if the X-Goog-Hash header is added there. Then, in SendNextChunkAsync, the header should only be added if this flag is false.
This ensures that the hash is sent only once, either at the initiation or with the final chunk, preventing redundant header transmission while still ensuring correctness.
| if (isFinalChunk) | |
| { | |
| string crc32cHash = await CalculateCrc32cAsync(stream, cancellationToken).ConfigureAwait(false); | |
| TrailingCrc32c = crc32cHash; | |
| if (!string.IsNullOrEmpty(TrailingCrc32c)) | |
| { | |
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}"); | |
| } | |
| } | |
| if (!_crc32cHeaderSentInInitiation && !string.IsNullOrEmpty(TrailingCrc32c)) | |
| { | |
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={TrailingCrc32c}"); | |
| } |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces CRC32C validation for resumable uploads, which is a valuable addition for data integrity. The implementation is well-structured, adding the necessary dependency and logic to calculate and attach the hash. I've identified a logical issue in the session initiation that prevents the hash from being sent in all possible cases, and a performance optimization for the hash calculation logic. Addressing these points will make the feature more robust and efficient.
|
|
||
| // 3. Calculate CRC32C (Using System.IO.Hashing) | ||
| var crc32c = new System.IO.Hashing.Crc32(); | ||
| byte[] buffer = new byte[Crc32cCalculationBufferSize]; |
There was a problem hiding this comment.
To improve performance and reduce memory allocations, consider using System.Buffers.ArrayPool<byte>.Shared to rent and return the buffer used for reading the stream. This avoids allocating a large byte array on the heap for each hash calculation, which can reduce garbage collector pressure, especially in applications that perform many uploads.
You would replace byte[] buffer = new byte[Crc32cCalculationBufferSize]; with byte[] buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(Crc32cCalculationBufferSize); and then wrap the while loop in a try...finally block to call System.Buffers.ArrayPool<byte>.Shared.Return(buffer);.
This would require adding a package reference to System.Buffers in the .csproj file.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces CRC32C validation for resumable uploads, which is an excellent addition for ensuring data integrity. The implementation is well-structured, particularly the logic for calculating the hash and handling it at both session initiation and on the final chunk. I've identified a potential bug related to non-seekable streams that could lead to a malformed header. Additionally, I've provided a few suggestions to enhance performance, clarity, and maintainability by leveraging modern .NET APIs and patterns, such as using ArrayPool for buffer management and a more robust method for handling byte order.
| byte[] hashBytes = crc32c.GetCurrentHash(); | ||
|
|
||
| // 4. Ensure Big-Endian byte order (required by Google APIs) | ||
| if (BitConverter.IsLittleEndian) | ||
| { | ||
| Array.Reverse(hashBytes); | ||
| } |
There was a problem hiding this comment.
For better clarity and to use more modern .NET APIs for handling endianness, I suggest using Crc32.GetCurrentHashAsUInt32() combined with System.Buffers.Binary.BinaryPrimitives.WriteUInt32BigEndian. This makes the intent of converting the hash to big-endian bytes more explicit and is generally more efficient than reversing the array.
uint hash = crc32c.GetCurrentHashAsUInt32();
byte[] hashBytes = new byte[4];
System.Buffers.Binary.BinaryPrimitives.WriteUInt32BigEndian(hashBytes, hash);| string calculatedHash = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false); | ||
| if (calculatedHash != null && Body != null) | ||
| { | ||
| var crcProperty = Body.GetType().GetProperty(Crc32cPropertyName); |
There was a problem hiding this comment.
Using reflection with GetProperty and a hardcoded string for Crc32cPropertyName is not type-safe and can be brittle. A change in the property name in any request object would cause this check to fail silently.
For future improvement, consider a more robust, type-safe approach. You could define an interface, for example ICrc32cProvider, with a string Crc32c { get; } property. Then you could check if Body implements this interface (if (Body is ICrc32cProvider crcProvider)). This would be safer, more maintainable, and more performant than reflection. I understand this might be a breaking change, so it's something to consider for a future version.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements CRC32C checksum validation for resumable uploads, incorporating the System.IO.Hashing package to calculate the CRC32C of the upload stream. The changes introduce new constants for the X-Goog-Hash header and buffer size, and add a CalculateCrc32cAsync method. The ResumableUpload logic is updated to include the X-Goog-Hash header with the calculated CRC32C either during session initiation (if a matching Crc32c is present in the request body metadata) or with the final upload chunk. A new test case verifies the presence of this header in the final request. A review comment notes that the newly added TrailingCrc32c property in InitiatedResumableUpload is currently unused and publicly mutable, suggesting its setter should be made private or internal, or the property removed, to prevent external misuse.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for CRC32C validation for resumable uploads in the JSON API, which is a great feature for ensuring data integrity. The implementation correctly handles sending the hash either during session initiation (if provided in metadata and the stream is seekable) or with the final chunk. The use of System.IO.Hashing and ArrayPool is efficient.
I have two main points of feedback:
- There's a performance issue in
InitiateSessionAsyncwhere the CRC32C hash is calculated for every seekable stream, even when it's not needed. I've suggested a change to calculate it lazily. - The new property
Crc32cHeaderSentInInitiationhas a public setter, which could be misused. I've recommended making its setterinternal.
Additionally, the new logic in InitiateSessionAsync for handling user-provided CRC32C hashes is not covered by unit tests. It would be beneficial to add tests for scenarios with matching hashes, mismatching hashes (expecting an exception), and when no hash is provided in the metadata.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces CRC32C checksum validation for resumable uploads, which is a valuable feature for ensuring data integrity. The implementation that handles validation during the session initiation by checking a user-provided hash is well-executed. However, I've identified a critical issue in the fallback mechanism that attempts to add the hash to the final upload chunk. This logic is based on an apparent misunderstanding of the Google Cloud Storage API, which will lead to upload failures for multi-chunk uploads. It also introduces a significant performance degradation for large files. I've provided a detailed explanation and a suggested fix for this critical issue. Additionally, I've included a minor suggestion to improve the robustness of property lookup. Addressing the critical issue is essential for the feature to work correctly and efficiently.
| bool isFinalChunk; | ||
| isFinalChunk = BytesClientSent == StreamLength; | ||
| // After creating the request content, and before sending it | ||
| if (isFinalChunk && !Crc32cHeaderSentInInitiation) | ||
| { | ||
| string trailingCrc32c = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false); | ||
| if (trailingCrc32c != null) | ||
| { | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={trailingCrc32c}"); | ||
| } | ||
| } | ||
| else | ||
| { | ||
| // Explicitly ensure the header is NOT present for non-final chunks | ||
| // (Just in case it was added by a default modifier elsewhere) | ||
| if (request.Headers.Contains(GoogleHashHeader)) | ||
| { | ||
| request.Headers.Remove(GoogleHashHeader); | ||
| } | ||
| } |
There was a problem hiding this comment.
This logic for adding the X-Goog-Hash header on the final chunk is incorrect and will cause uploads to fail.
According to the Google Cloud Storage documentation on hashes, when a hash is provided in the final request of a resumable upload, it is used to validate only the data in that final chunk, not the entire object. This implementation calculates the CRC32C hash of the entire stream (CalculateCrc32cAsync) and sends it with the final chunk. For any multi-chunk upload, the full object hash will not match the hash of just the final chunk, causing the server to reject the upload with a 400 Bad Request error.
Furthermore, CalculateCrc32cAsync re-reads the entire stream from the beginning. For large files, this introduces a severe performance penalty, as the entire file content is read from disk a second time just before the upload completes.
Full object validation should be performed by providing the hash in the initial request, which is already correctly implemented in InitiateSessionAsync. This fallback logic should be removed.
Removing this block will also require removing the test assertion in ResumableUploadTest.MultiChunk.cs at lines 499-503, which was added to test this incorrect behavior.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads, which is a great feature for ensuring data integrity. The implementation is mostly solid, but I've found a critical bug in the CRC32C hash calculation related to incorrect endianness handling that would cause validation to fail on most systems. I've also suggested an improvement to the new test to make it more robust by validating the hash value, which would have caught this bug. Additionally, there's a minor code style suggestion for better readability. Please address the critical issue before merging.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads, which is a great feature for ensuring data integrity. The implementation is well-done, using System.IO.Hashing for checksum calculation and correctly handling stream operations. The changes include adding logic to send the hash on the final chunk if not provided during initiation, and also validating a user-provided hash in the request body against the stream content. The tests have been updated to cover some of the new functionality. I have a couple of suggestions: one to improve performance by caching reflection data, and another to add a test case for a new validation scenario.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads, which is a valuable feature for data integrity. The implementation is solid, covering scenarios where the hash is provided at the start of the upload or with the final chunk. The code is robust, using ArrayPool for efficient buffer management and correctly handling stream positions. My main feedback is to enhance test coverage by adding a test case for the scenario where the hash is provided in the request metadata during upload initiation, to ensure all aspects of this new feature are verified and to prevent future regressions.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces CRC32C validation for resumable uploads, enhancing data integrity. The implementation correctly handles validation at session initiation and on the final chunk. The code is well-structured, and the tests cover the new functionality effectively. I have a few suggestions to improve maintainability and clean up some test code.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds CRC32C validation for resumable uploads, which is an excellent feature for ensuring data integrity. The implementation correctly handles two main scenarios: providing the hash at session initiation and as a trailing header with the final chunk. The accompanying tests are thorough, covering both success and failure cases. My review includes a few suggestions to improve code style and readability.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces CRC32C validation for resumable uploads, which is a great addition for ensuring data integrity. The implementation correctly handles validation at session initiation and adds a trailing hash for validation on completion. The new tests are also well-written and cover important scenarios.
My review includes a few suggestions. The most critical one is a performance concern regarding how the CRC32C hash is calculated for uploads without pre-supplied metadata, which currently involves re-reading the entire file at the end of the upload. I've also included a couple of minor suggestions to improve code maintainability and consistency.
Overall, this is a solid contribution. Addressing the performance issue will make this feature much more robust for large file uploads.
| if (isFinalChunk && !Crc32cHeaderSentInInitiation) | ||
| { | ||
| string trailingCrc32c = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false); | ||
| if (trailingCrc32c != null) | ||
| { | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={trailingCrc32c}"); | ||
| } | ||
| } |
There was a problem hiding this comment.
There's a significant performance issue here. Calling CalculateCrc32cAsync just before sending the final chunk causes the entire ContentStream to be read from the beginning again. For large files, this is very inefficient as the file content is read twice: once for uploading chunk by chunk, and a second time in full at the very end.
A more performant approach for seekable streams would be to calculate the hash once at the beginning of the upload process and store it for later use. For example, you could compute the hash in UploadCoreAsync before the while loop starts and store it in a private field. This would avoid re-reading the entire stream at the end of a potentially long upload.
Example of what this could look like:
// In ResumableUpload class
private string _fullStreamCrc32c;
private async Task<IUploadProgress> UploadCoreAsync(CancellationToken cancellationToken)
{
// Calculate hash at the start of the upload if stream is seekable.
_fullStreamCrc32c = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false);
try
{
while (!await SendNextChunkAsync(ContentStream, cancellationToken).ConfigureAwait(false))
{
// ...
}
// ...
}
// ...
}
// In SendNextChunkAsync
if (isFinalChunk && !Crc32cHeaderSentInInitiation)
{
if (_fullStreamCrc32c != null)
{
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={_fullStreamCrc32c}");
}
}This would move the performance cost of the extra read to the beginning of the upload, which is a much better user experience than a long pause before the upload completes.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for CRC32C validation for resumable uploads. The changes include client-side validation of a provided CRC32C hash before starting an upload, and sending the hash to the server for validation, either at initiation or with the final chunk. My review has identified a couple of areas for improvement. A significant performance issue is that the upload stream is read twice for seekable streams: once to calculate the hash and once to upload the data, which is inefficient for large files. I've suggested an alternative approach to calculate the hash on-the-fly. Additionally, I've noted a design consideration regarding the use of reflection to retrieve the CRC32C hash from the request body, proposing an interface-based alternative for better type safety and performance. The new tests are well-written and cover the new functionality, including success and failure scenarios.
| var metadataHash = GetPropertyFromGenericBody(Body, Crc32c); | ||
| if (!string.IsNullOrEmpty(metadataHash)) | ||
| { | ||
| string calculatedHash = await CalculateCrc32cAsync(ContentStream, cancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
This implementation has a significant performance implication for large files. CalculateCrc32cAsync reads the entire stream to compute the hash. The stream is then read a second time during the actual upload process in UploadCoreAsync. For large files, this double-read can be very inefficient, consuming extra time and I/O resources.
A more efficient approach would be to calculate the CRC32C hash on-the-fly as the stream is read for uploading. This could be achieved by wrapping the ContentStream in a custom Stream implementation that computes the hash as data is read from it. This would avoid reading the stream twice.
While the current approach is simpler to implement, the performance cost is notable and this behavior should be clearly documented for users of this feature.
| internal string GetPropertyFromGenericBody(TRequest body, string propertyName) => | ||
| body?.GetType().GetProperty(propertyName)?.GetValue(body) as string; |
There was a problem hiding this comment.
Using reflection to get the Crc32c property is flexible but has some drawbacks:
- It's not type-safe. A typo in the property name (
Crc32c) would only be discovered at runtime. - It has a performance overhead compared to direct property access.
Consider defining an interface for bodies that support CRC32C validation:
public interface IBodyWithCrc32c
{
string Crc32c { get; }
}Then, in InitiateSessionAsync, you could check if Body implements this interface:
if (Body is IBodyWithCrc32c bodyWithCrc32c)
{
var metadataHash = bodyWithCrc32c.Crc32c;
// ... rest of the logic
}This approach provides compile-time safety and is more performant. While this would require users to implement the interface, it makes the contract for this feature explicit. Given that reflection is used elsewhere in this library, the current approach may be consistent with existing design patterns, but an interface-based approach is generally preferable for new features.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for full object CRC32C validation for resumable uploads in the JSON API. The implementation correctly handles two scenarios: providing the checksum in the initial request if available in the object metadata, and sending it as a trailing header in the final upload chunk for seekable streams. The code is well-structured, with efficient use of System.Buffers.ArrayPool for stream processing. The new functionality is also well-covered by unit tests. I have one suggestion regarding the use of reflection, which could be replaced by an interface-based approach for better type safety and performance.
| internal string GetPropertyFromGenericBody(TRequest body, string propertyName) => | ||
| body?.GetType().GetProperty(propertyName)?.GetValue(body) as string; |
There was a problem hiding this comment.
Using reflection to access the Crc32c property is a pragmatic way to avoid breaking changes. However, this approach is not type-safe and is less performant than direct property access. A typo in the property name would only be caught at runtime.
A more robust alternative would be to introduce an interface, for example IUploadWithCrc32c, that exposes the Crc32c property. You could then check if the Body object implements this interface.
Example:
public interface IUploadWithCrc32c
{
string Crc32c { get; }
}
// In InitiateSessionAsync:
if (Body is IUploadWithCrc32c uploadWithCrc && !string.IsNullOrEmpty(uploadWithCrc.Crc32c))
{
var metadataHash = uploadWithCrc.Crc32c;
// ...
}This would provide compile-time safety and better performance. It's a non-breaking change, and types that need this functionality can opt-in by implementing the interface.
faf65af to
f65e4ea
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces full object checksum validation (CRC32C) for JSON-based resumable uploads, which is a great enhancement for data integrity. The implementation is robust, handling various scenarios such as providing the checksum during session initiation or with the final chunk. It correctly distinguishes between seekable and non-seekable streams for client-side hash calculation. The new functionality is well-covered by unit tests. I've added one minor suggestion to improve the consistency and robustness of the tests.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for ensuring data integrity through CRC32C checksum validation in resumable uploads. The implementation is well-structured and includes thorough tests for various scenarios. However, there is a critical issue regarding target framework compatibility. The new dependency on System.IO.Hashing is only compatible with .NET 6.0 and later, which will cause build failures for the netstandard2.0 and net462 targets. My review provides specific suggestions to resolve this by using conditional compilation for the new code and the package reference. These changes will enable the feature on supported frameworks while maintaining compatibility with older ones.
| /// <summary> | ||
| /// Verifies that a resumable upload completes successfully and validates crc32c checksum across different chunking configurations | ||
| /// and stream types of known size. | ||
| /// </summary> | ||
| [Theory, CombinatorialData] | ||
| public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer( | ||
| [CombinatorialValues(true)] bool knownSize, | ||
| [CombinatorialValues(80, 150)] int partialSize, | ||
| [CombinatorialValues(100, 200)] int chunkSize) | ||
| { | ||
| var actualChunkSize = Math.Min(partialSize, chunkSize); | ||
| var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1; | ||
| using (var server = new MultiChunkPartialServer(_server, partialSize)) | ||
| using (var service = new MockClientService(server.HttpPrefix)) | ||
| { | ||
| var content = knownSize ? new MemoryStream(UploadTestBytes) : new UnknownSizeMemoryStream(UploadTestBytes); | ||
| var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", actualChunkSize); | ||
| var progress = uploader.Upload(); | ||
| Assert.Equal(expectedCallCount, server.Requests.Count); | ||
| var request = server.Requests.Last(); | ||
| Assert.Contains(ResumableUpload.GoogleHashHeader, request.Headers.AllKeys); | ||
| var headerValue = request.Headers[ResumableUpload.GoogleHashHeader]; | ||
| var expectedBase64Hash = CalculateCrc32c(UploadTestBytes); | ||
| Assert.Equal($"crc32c={expectedBase64Hash}", headerValue); | ||
| Assert.Equal(UploadTestBytes, server.Bytes); | ||
| Assert.Equal(UploadStatus.Completed, progress.Status); | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Verifies that the session initiation succeeds when a matching CRC32C checksum is provided. | ||
| /// </summary> | ||
| [Fact] | ||
| public async Task TestInitiateSessionWithMatchingCrc32c() | ||
| { | ||
| int partialSize = 80; | ||
| var content = new MemoryStream(UploadTestBytes); | ||
| using (var server = new MultiChunkPartialServer(_server, partialSize)) | ||
| using (var service = new MockClientService(server.HttpPrefix)) | ||
| { | ||
| var chunkSize = 1024; | ||
| var expectedBase64Hash = CalculateCrc32c(UploadTestBytes); | ||
| var body = new TestResumableUploadWithCrc32c { Crc32c = expectedBase64Hash }; | ||
| var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize) | ||
| { | ||
| Body = body | ||
| }; | ||
|
|
||
| await uploader.InitiateSessionAsync(); | ||
| // Assert that the initiation request was sent | ||
| Assert.Single(server.Requests); | ||
| var initiationRequest = server.Requests.First(); | ||
|
|
||
| // Assert that the X-Goog-Hash header was sent with the correct CRC32C | ||
| Assert.Contains(ResumableUpload.GoogleHashHeader, initiationRequest.Headers.AllKeys); | ||
| Assert.Equal($"crc32c={expectedBase64Hash}", initiationRequest.Headers[ResumableUpload.GoogleHashHeader]); | ||
| Assert.True(uploader.Crc32cHeaderSentInInitiation); | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Verifies that the session initiation fails when a mismatched CRC32C checksum is provided. | ||
| /// </summary> | ||
| [Fact] | ||
| public async Task TestInitiateSessionWithMismatchingCrc32c() | ||
| { | ||
| int partialSize = 80; | ||
| var content = new MemoryStream(UploadTestBytes); | ||
| using (var server = new MultiChunkPartialServer(_server, partialSize)) | ||
| using (var service = new MockClientService(server.HttpPrefix)) | ||
| { | ||
| var chunkSize = 1024; | ||
| var expectedBase64Hash = CalculateCrc32c(UploadTestBytes); | ||
| var body = new TestResumableUploadWithCrc32c { Crc32c = "incorrectHash" }; | ||
|
|
||
| var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize) | ||
| { | ||
| Body = body | ||
| }; | ||
|
|
||
| var exception = await Assert.ThrowsAsync<InvalidOperationException>(() => uploader.InitiateSessionAsync()); | ||
| Assert.Equal($"The calculated CRC32C of the stream ({expectedBase64Hash}) does not match the CRC32C provided in the object metadata ({body.Crc32c}).", exception.Message); | ||
|
|
||
| // Assert that no request was sent to the server | ||
| Assert.Empty(server.Requests); | ||
| Assert.False(uploader.Crc32cHeaderSentInInitiation); | ||
| } | ||
| } |
There was a problem hiding this comment.
These tests for CRC32C checksum validation rely on System.IO.Hashing, which is only available on .NET 6.0 and newer. To prevent build failures on older target frameworks, this entire block of tests should be conditionally compiled.
#if NET6_0_OR_GREATER
/// <summary>
/// Verifies that a resumable upload completes successfully and validates crc32c checksum across different chunking configurations
/// and stream types of known size.
/// </summary>
[Theory, CombinatorialData]
public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer(
[CombinatorialValues(true)] bool knownSize,
[CombinatorialValues(80, 150)] int partialSize,
[CombinatorialValues(100, 200)] int chunkSize)
{
var actualChunkSize = Math.Min(partialSize, chunkSize);
var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1;
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var content = knownSize ? new MemoryStream(UploadTestBytes) : new UnknownSizeMemoryStream(UploadTestBytes);
var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", actualChunkSize);
var progress = uploader.Upload();
Assert.Equal(expectedCallCount, server.Requests.Count);
var request = server.Requests.Last();
Assert.Contains(ResumableUpload.GoogleHashHeader, request.Headers.AllKeys);
var headerValue = request.Headers[ResumableUpload.GoogleHashHeader];
var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
Assert.Equal($"crc32c={expectedBase64Hash}", headerValue);
Assert.Equal(UploadTestBytes, server.Bytes);
Assert.Equal(UploadStatus.Completed, progress.Status);
}
}
/// <summary>
/// Verifies that the session initiation succeeds when a matching CRC32C checksum is provided.
/// </summary>
[Fact]
public async Task TestInitiateSessionWithMatchingCrc32c()
{
int partialSize = 80;
var content = new MemoryStream(UploadTestBytes);
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var chunkSize = 1024;
var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
var body = new TestResumableUploadWithCrc32c { Crc32c = expectedBase64Hash };
var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize)
{
Body = body
};
await uploader.InitiateSessionAsync();
// Assert that the initiation request was sent
Assert.Single(server.Requests);
var initiationRequest = server.Requests.First();
// Assert that the X-Goog-Hash header was sent with the correct CRC32C
Assert.Contains(ResumableUpload.GoogleHashHeader, initiationRequest.Headers.AllKeys);
Assert.Equal($"crc32c={expectedBase64Hash}", initiationRequest.Headers[ResumableUpload.GoogleHashHeader]);
Assert.True(uploader.Crc32cHeaderSentInInitiation);
}
}
/// <summary>
/// Verifies that the session initiation fails when a mismatched CRC32C checksum is provided.
/// </summary>
[Fact]
public async Task TestInitiateSessionWithMismatchingCrc32c()
{
int partialSize = 80;
var content = new MemoryStream(UploadTestBytes);
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var chunkSize = 1024;
var expectedBase64Hash = CalculateCrc32c(UploadTestBytes);
var body = new TestResumableUploadWithCrc32c { Crc32c = "incorrectHash" };
var uploader = new TestResumableUpload(service, "MultiChunk", "POST", content, "text/plain", chunkSize)
{
Body = body
};
var exception = await Assert.ThrowsAsync<InvalidOperationException>(() => uploader.InitiateSessionAsync());
Assert.Equal($"The calculated CRC32C of the stream ({expectedBase64Hash}) does not match the CRC32C provided in the object metadata ({body.Crc32c}).", exception.Message);
// Assert that no request was sent to the server
Assert.Empty(server.Requests);
Assert.False(uploader.Crc32cHeaderSentInInitiation);
}
}
#endifThere was a problem hiding this comment.
The conditional compilation directives have been correctly applied to the CRC32C checksum validation tests and related code, addressing the compatibility issue with older target frameworks. This resolves the critical issue previously identified. Thank you for the quick resolution!
| /// <summary> | ||
| /// Resumable upload class that allows Crc32c to be set for testing. | ||
| /// </summary> | ||
| private class TestResumableUploadWithCrc32c | ||
| { | ||
| public string Crc32c { get; set; } | ||
| } |
There was a problem hiding this comment.
This test helper class is only used for CRC32C checksum validation tests, which are only available on .NET 6.0 and newer. To prevent build failures on older target frameworks, this class should be conditionally compiled.
#if NET6_0_OR_GREATER
/// <summary>
/// Resumable upload class that allows Crc32c to be set for testing.
/// </summary>
private class TestResumableUploadWithCrc32c
{
public string Crc32c { get; set; }
}
#endifThere was a problem hiding this comment.
Thanks for the /gemini review command! I've re-evaluated the changes.
It appears the TestResumableUploadWithCrc32c class in Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.Utilities.cs still needs to be conditionally compiled for .NET 6.0 or greater. This is crucial to prevent build failures on older target frameworks, as the CRC32C checksum validation features it supports are only available in .NET 6.0 and newer.
Please apply the conditional compilation as suggested previously.
| /// <summary> | |
| /// Resumable upload class that allows Crc32c to be set for testing. | |
| /// </summary> | |
| private class TestResumableUploadWithCrc32c | |
| { | |
| public string Crc32c { get; set; } | |
| } | |
| #if NET6_0_OR_GREATER | |
| /// <summary> | |
| /// Resumable upload class that allows Crc32c to be set for testing. | |
| /// </summary> | |
| private class TestResumableUploadWithCrc32c | |
| { | |
| public string Crc32c { get; set; } | |
| } | |
| #endif |
There was a problem hiding this comment.
Thanks for the /gemini review command! I've re-evaluated the changes.
The TestResumableUploadWithCrc32c class in Src/Support/Google.Apis.Tests/Apis/Upload/ResumableUploadTest.Utilities.cs still needs to be conditionally compiled for .NET 6.0 or greater. This is essential to prevent build failures on older target frameworks, as the CRC32C checksum validation features it supports are only available in .NET 6.0 and newer.
Please apply the conditional compilation as suggested previously.
| /// <summary> | |
| /// Resumable upload class that allows Crc32c to be set for testing. | |
| /// </summary> | |
| private class TestResumableUploadWithCrc32c | |
| { | |
| public string Crc32c { get; set; } | |
| } | |
| #if NET6_0_OR_GREATER | |
| /// <summary> | |
| /// Resumable upload class that allows Crc32c to be set for testing. | |
| /// </summary> | |
| private class TestResumableUploadWithCrc32c | |
| { | |
| public string Crc32c { get; set; } | |
| } | |
| #endif |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively enables full object CRC32C checksum validation for JSON-based resumable uploads, which is a great enhancement for data integrity. The implementation is well-structured, conditionally compiled for .NET 6+, and includes a good set of tests to verify the new functionality. The core logic correctly handles checksum validation for seekable streams, both at session initiation and on the final upload chunk. I have one suggestion to improve the developer experience by adding a warning log when a checksum is provided for a non-seekable stream, as it's currently silently ignored.
| if (calculatedHash != null) | ||
| { | ||
| if (metadataHash != calculatedHash) | ||
| { | ||
| throw new InvalidOperationException( | ||
| $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash})."); | ||
| } | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}"); | ||
| Crc32cHeaderSentInInitiation = true; | ||
| } |
There was a problem hiding this comment.
When a CRC32C hash is provided for a non-seekable stream, it is silently ignored. This could be confusing for developers who expect the hash to be used. It would be better to log a warning to inform them that the hash cannot be validated on the client and will not be sent to the server because the stream is not seekable.
if (calculatedHash != null)
{
if (metadataHash != calculatedHash)
{
throw new InvalidOperationException(
$"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash}).");
}
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
Crc32cHeaderSentInInitiation = true;
}
else
{
Logger.Warning("A CRC32C hash was provided in the request body, but the stream is not seekable. The hash will not be validated on the client, and will not be sent to the server.");
}There was a problem hiding this comment.
Thank you for incorporating the warning! That addresses the core concern.
For better integration with potential logging frameworks used by consumers of this library, it would be more idiomatic to use Logger.Warning instead of Console.WriteLine. This allows the application to configure how these warnings are handled (e.g., routed to a log file, a specific logging service, or suppressed).
| if (calculatedHash != null) | |
| { | |
| if (metadataHash != calculatedHash) | |
| { | |
| throw new InvalidOperationException( | |
| $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash})."); | |
| } | |
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}"); | |
| Crc32cHeaderSentInInitiation = true; | |
| } | |
| else | |
| { | |
| Logger.Warning("A CRC32C hash was provided in the request body, but the stream is not seekable. The hash will not be validated on the client, and will not be sent to the server."); | |
| } |
| if (calculatedHash != null) | |
| { | |
| if (metadataHash != calculatedHash) | |
| { | |
| throw new InvalidOperationException( | |
| $"The calculated CRC32C of the stream ({calculatedHash}) does not match the CRC32C provided in the object metadata ({metadataHash})."); | |
| } | |
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}"); | |
| Crc32cHeaderSentInInitiation = true; | |
| } | |
| else | |
| { | |
| Logger.Warning("A CRC32C hash was provided in the request body, but the stream is not seekable. The hash will not be validated on the client, and will not be sent to the server."); | |
| } |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces full object checksum validation (CRC32C) for multi-chunk resumable uploads, which is a great enhancement for data integrity. The implementation is well-structured, using conditional compilation for .NET 6+ features and correctly handling seekable and non-seekable streams. The logic to include the checksum either in the initiation request or with the final chunk is sound. The accompanying tests are thorough and cover various scenarios. I have one minor suggestion to improve logging practices.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces full object CRC32C checksum validation for multi-chunk resumable uploads, which is a great enhancement for data integrity. The implementation is well-structured, using conditional compilation for .NET 6+ to leverage System.IO.Hashing. The logic correctly handles checksum validation during session initiation for seekable streams and as a trailing header for the final chunk. The accompanying tests are thorough, covering various scenarios including matching/mismatching checksums and different chunking configurations. I have a couple of minor suggestions to improve code clarity and correctness in comments.
| public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer( | ||
| [CombinatorialValues(true)] bool knownSize, | ||
| [CombinatorialValues(80, 150)] int partialSize, | ||
| [CombinatorialValues(100, 200)] int chunkSize) | ||
| { | ||
| var actualChunkSize = Math.Min(partialSize, chunkSize); | ||
| var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1; | ||
| using (var server = new MultiChunkPartialServer(_server, partialSize)) | ||
| using (var service = new MockClientService(server.HttpPrefix)) | ||
| { | ||
| var content = knownSize ? new MemoryStream(UploadTestBytes) : new UnknownSizeMemoryStream(UploadTestBytes); |
There was a problem hiding this comment.
The knownSize parameter is only ever true in this test, making it and the conditional logic for creating the content stream redundant. To improve clarity and simplify the test, you can remove the knownSize parameter and directly initialize content as a MemoryStream.
public void TestChunkedUpload_CompletesAndVerifiesChecksumInPartialServer(
[CombinatorialValues(80, 150)] int partialSize,
[CombinatorialValues(100, 200)] int chunkSize)
{
var actualChunkSize = Math.Min(partialSize, chunkSize);
var expectedCallCount = (UploadLength + actualChunkSize - 1) / actualChunkSize + 1;
using (var server = new MultiChunkPartialServer(_server, partialSize))
using (var service = new MockClientService(server.HttpPrefix))
{
var content = new MemoryStream(UploadTestBytes);| /// The size of the buffer (in bytes) used when reading the stream to calculate the CRC32C checksum. | ||
| /// Set to 80 KB (81920 bytes) to balance memory usage and I/O performance. | ||
| /// </summary> |
There was a problem hiding this comment.
There's a minor inaccuracy in the comment. 81920 bytes is 80 KiB (kibibytes), not 80 KB (kilobytes). Using "KiB" is more precise for base-2 units.
/// The size of the buffer (in bytes) used when reading the stream to calculate the CRC32C checksum.
/// Set to 80 KiB (81920 bytes) to balance memory usage and I/O performance.
/// </summary>8eba83c to
faf4903
Compare
This PR enables full object checksum validation (specifically CRC32C) for JSON-based resumable uploads, ensuring end-to-end data integrity for multi-chunk transfers. Please see b/461996128