feat(Storage): Enable full object checksum validation for multi chunk resumable uploads#10
feat(Storage): Enable full object checksum validation for multi chunk resumable uploads#10mahendra-google wants to merge 9 commits intomainfrom
Conversation
Summary of ChangesHello @mahendra-google, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the robustness of multi-chunk resumable uploads by integrating full object CRC32c checksum validation. By introducing a dedicated helper for checksum calculation and an HTTP interceptor that applies this checksum to the final upload request, the changes ensure that data integrity is maintained throughout the entire upload process, providing greater reliability for large file transfers to Google Cloud Storage. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces checksum validation for multi-chunk resumable uploads by calculating the CRC32C hash of the entire object and sending it with the final chunk. The implementation adds a new Crc32cHelper class, a CustomMediaUpload class with an HTTP interceptor, and modifies LengthOnlyStream.
My review has identified several critical issues that need to be addressed. There's a memory leak where the HTTP interceptor is never removed, a bug that causes an incorrect checksum to be calculated for non-seekable streams (which will cause uploads to fail), and a bug in the LengthOnlyStream implementation. I've also included suggestions to improve robustness and clarity.
| _stream = stream; | ||
| _crc32c = body.Crc32c; | ||
| _interceptor = new Crc32cHashInterceptor(_stream, _crc32c); | ||
| service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); |
There was a problem hiding this comment.
The CRC32C hash calculation is performed on the stream provided to CustomMediaUpload. When the original source stream for an upload is not seekable, a LengthOnlyStream is used to initiate the resumable upload. This LengthOnlyStream is then passed to CustomMediaUpload and its interceptor. The interceptor will then attempt to calculate a CRC32C hash on this dummy stream (which contains only zeros), resulting in an incorrect hash. This will cause the server-side validation to fail. The hash calculation should only be performed when the actual content stream is available and is seekable.
| _stream = stream; | ||
| _crc32c = body.Crc32c; | ||
| _interceptor = new Crc32cHashInterceptor(_stream, _crc32c); | ||
| service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); |
There was a problem hiding this comment.
The Crc32cHashInterceptor is added to the HttpClient's message handler but is never removed. Since the HttpClient and its handler are typically long-lived and reused, this will cause the interceptor to persist beyond the scope of this single upload. This is a memory leak and can lead to incorrect behavior on subsequent requests. The interceptor should be responsible for removing itself from the handler's interceptor list after it has served its purpose (i.e., after processing the final chunk).
To fix this, you can pass the ConfigurableMessageHandler to the Crc32cHashInterceptor's constructor. The interceptor can then store this reference and use it to remove itself from the ExecuteInterceptors list in InterceptAsync after it has added the hash header for the final chunk.
| long? remaining = _length - _position; | ||
| if (remaining <= 0) return 0; | ||
|
|
||
| int toRead = (int) Math.Min(count, (byte) remaining); |
There was a problem hiding this comment.
There is a bug in the calculation of toRead. The remaining variable is a long?. Casting it to byte via (byte) remaining will cause it to truncate and wrap around for stream lengths greater than 255 bytes. This will result in an incorrect number of bytes being "read" from this simulated stream. You should use remaining.Value and perform the Math.Min operation with compatible types.
int toRead = (int) Math.Min((long)count, remaining.Value);| private bool IsFinalChunk(string rangeHeader) | ||
| { | ||
| try | ||
| { | ||
| string parts = rangeHeader.Replace("bytes ", ""); | ||
| string[] split = parts.Split('/'); | ||
| if (split.Length < 2 || split[1] == "*") return false; | ||
|
|
||
| long totalSize = long.Parse(split[1]); | ||
| string[] range = split[0].Split('-'); | ||
| long endByte = long.Parse(range[1]); | ||
|
|
||
| // If endByte is the last byte of the file, it's the final chunk | ||
| return (endByte + 1) == totalSize; | ||
| } | ||
| catch { return false; } | ||
| } |
There was a problem hiding this comment.
The current implementation of IsFinalChunk uses a broad try-catch block and string manipulation that can be fragile. For example, if the range part of the header is *, range[1] will throw an IndexOutOfRangeException, which is then silently caught. It's better to use more robust parsing logic with long.TryParse and more specific string checks to avoid relying on exception handling for control flow.
private bool IsFinalChunk(string rangeHeader)
{
// Expected format: "bytes <start>-<end>/<total>" or "bytes */<total>"
if (!rangeHeader.StartsWith("bytes "))
{
return false;
}
string[] parts = rangeHeader.Substring(6).Split('/');
if (parts.Length != 2)
{
return false;
}
string range = parts[0];
string total = parts[1];
if (range == "*" || total == "*" || !long.TryParse(total, out long totalSize))
{
return false;
}
string[] rangeParts = range.Split('-');
if (rangeParts.Length != 2 || !long.TryParse(rangeParts[1], out long endByte))
{
return false;
}
return endByte + 1 == totalSize;
}| { | ||
| private const int Crc32cCalculationBufferSize = 81920; | ||
|
|
||
| public static string CalculateCrc32cAsync(Stream stream) |
There was a problem hiding this comment.
The method CalculateCrc32cAsync is named as if it were asynchronous, but it performs synchronous I/O (stream.Read). This is misleading and can cause confusion. Please rename it to CalculateCrc32c to reflect its synchronous nature. You will also need to update the call site in Crc32cHashInterceptor.
public static string CalculateCrc32c(Stream stream)| private class Crc32cHashInterceptor : IHttpExecuteInterceptor | ||
| { | ||
| private string _calculatedHash; | ||
| private readonly string _crc32c; |
There was a problem hiding this comment.
The field _crc32c is initialized in the Crc32cHashInterceptor constructor but its value is never used. The interceptor always calculates a new hash. If the intention is to use a pre-computed hash when provided, this logic is missing. If it's not intended to be used, it should be removed to avoid confusion and dead code.
There was a problem hiding this comment.
Code Review
This pull request introduces checksum validation for multi-chunk resumable uploads. The overall approach is sound, but there are a few critical issues that need to be addressed. The use of IHttpExecuteInterceptor is problematic as it's not being removed, leading to potential issues in subsequent requests. Additionally, the changes to LengthOnlyStream have introduced bugs that could cause runtime exceptions. I've provided detailed comments and suggestions for these critical issues, along with some other feedback to improve code quality and maintainability.
| public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket, | ||
| Stream stream, string contentType) | ||
| : base(service, body, bucket, stream, contentType) | ||
| { | ||
| _stream = stream; | ||
| _crc32c = body.Crc32c; | ||
| _interceptor = new Crc32cHashInterceptor(_stream, _crc32c); | ||
| service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); | ||
| } |
There was a problem hiding this comment.
The Crc32cHashInterceptor is added to the service.HttpClient.MessageHandler, which is a long-lived, shared component. However, the interceptor is never removed after the upload completes. This means it will persist for the lifetime of the HttpClient, intercepting all subsequent HTTP requests, not just for this specific upload. This can lead to incorrect behavior, especially if other uploads are performed, as the interceptor holds a reference to the stream of the first upload.
A much safer approach is to use Options.ModifyRequest, which is scoped to a single upload operation and avoids modifying shared state. The logic from the interceptor can be moved into a lambda assigned to Options.ModifyRequest within the CustomMediaUpload constructor.
There was a problem hiding this comment.
@gemini-code-assist Can you provide suggestions for this?
There was a problem hiding this comment.
Certainly, @mahendra-google! I can provide a suggestion for this. As discussed, the current approach of adding a Crc32cHashInterceptor to the shared HttpClient.MessageHandler can lead to issues due to its long-lived nature.
Due to the strict line number constraints for inline code suggestions, I've provided a placeholder in the suggestion field below that removes the problematic interceptor registration. For the complete and correct implementation using Options.ModifyRequest, please refer to the detailed explanation and code block here.
To fully address the issue, you should make the following changes:
- Remove the
_interceptorfield from theCustomMediaUploadclass (currently on line 34). - Update the
CustomMediaUploadconstructor to useOptions.ModifyRequestas shown below. - Add the
IsFinalChunkhelper method as a private static method within theCustomMediaUploadclass. - Remove the entire
Crc32cHashInterceptorclass definition (currently from line 47 to 93).
Here is the complete suggested structure for your CustomMediaUpload class:
internal sealed class CustomMediaUpload : InsertMediaUpload
{
private readonly Stream _stream;
private readonly string _crc32c;
// REMOVE THIS FIELD: private readonly Crc32cHashInterceptor _interceptor;
public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket,
Stream stream, string contentType)
: base(service, body, bucket, stream, contentType)
{
_stream = stream;
_crc32c = body.Crc32c;
Options.ModifyRequest += (request) =>
{
if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") == true)
{
string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();
if (IsFinalChunk(rangeHeader))
{
string calculatedHash = Crc32cHelper.CalculateCrc32cAsync(_stream);
request.Headers.TryAddWithoutValidation("x-goog-hash", $"crc32c={calculatedHash}");
}
}
};
}
internal new ResumableUploadOptions Options => base.Options;
private static bool IsFinalChunk(string rangeHeader)
{
try
{
string parts = rangeHeader.Replace("bytes ", "");
string[] split = parts.Split('/');
if (split.Length < 2 || split[1] == "*") return false;
long totalSize = long.Parse(split[1]);
string[] range = split[0].Split('-');
long endByte = long.Parse(range[1]);
// If endByte is the last byte of the file, it's the final chunk
return (endByte + 1) == totalSize;
}
catch { return false; }
}
// REMOVE THE ENTIRE Crc32cHashInterceptor CLASS DEFINITION FROM HERE
}This approach ensures that the CRC32c hash calculation and header addition are scoped correctly to each upload operation, preventing unintended side effects on other HTTP requests.
There was a problem hiding this comment.
@gemini-code-assist Options.ModifyRequest is not available?
There was a problem hiding this comment.
You're right to point that out, @mahendra-google! My apologies for the oversight in the previous suggestion. The ModifyRequest property on ResumableUploadOptions is an Action<HttpRequestMessage>, not an event. Therefore, it should be assigned using the = operator, not +=.
This is a common point of confusion, as Action delegates can sometimes be used with += if they are declared as events. However, as a public property, direct assignment is required.
I've updated the code suggestion below to reflect this correction. Please apply this change to your CustomMediaUpload constructor.
Additionally, please remember the other crucial steps from my previous comment to fully resolve the issue:
- Remove the
_interceptorfield from theCustomMediaUploadclass (currently on line 34). - Add the
IsFinalChunkhelper method as a private static method within theCustomMediaUploadclass. - Remove the entire
Crc32cHashInterceptorclass definition (currently from line 47 to 93).
These steps, combined with the corrected constructor, will ensure the CRC32c hash calculation is applied correctly and without side effects on other HTTP requests.
| public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket, | |
| Stream stream, string contentType) | |
| : base(service, body, bucket, stream, contentType) | |
| { | |
| _stream = stream; | |
| _crc32c = body.Crc32c; | |
| _interceptor = new Crc32cHashInterceptor(_stream, _crc32c); | |
| service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); | |
| } | |
| public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket, | |
| Stream stream, string contentType) | |
| : base(service, body, bucket, stream, contentType) | |
| { | |
| _stream = stream; | |
| _crc32c = body.Crc32c; | |
| Options.ModifyRequest = (request) => | |
| { | |
| if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") == true) | |
| { | |
| string rangeHeader = request.Content.Headers.GetValues("Content-Range").First(); | |
| if (IsFinalChunk(rangeHeader)) | |
| { | |
| string calculatedHash = Crc32cHelper.CalculateCrc32cAsync(_stream); | |
| request.Headers.TryAddWithoutValidation("x-goog-hash", $"crc32c={calculatedHash}"); | |
| } | |
| } | |
| }; | |
| } |
There was a problem hiding this comment.
@gemini-code-assist Resumable upload options doesn't have ModifyRequest at all?
There was a problem hiding this comment.
You are absolutely correct, @mahendra-google. I apologize for the continued errors. It seems I am having trouble keeping the API surface for ResumableUploadOptions straight.
I will need to retract my previous suggestions, as it appears there is no direct way to modify the HttpRequestMessage using the ResumableUploadOptions. The initial approach of using an IHttpExecuteInterceptor is closer to the correct solution, but as I mentioned before, it needs to be scoped correctly to avoid affecting other requests.
Given these constraints, I recommend exploring alternative approaches to achieve the desired CRC32C validation for multi-chunk resumable uploads. I don't have a specific code suggestion at this time, but I can offer some general guidance:
- Re-evaluate the necessity of modifying the request directly: Instead of adding the
x-goog-hashheader, consider if the CRC32C validation can be performed on the server-side after the upload is complete. This would eliminate the need for an interceptor or request modification. - If request modification is unavoidable, explore creating a custom
HttpClientinstance: Instead of modifying the sharedHttpClient, create a newHttpClientinstance specifically for the resumable upload. This would allow you to add theCrc32cHashInterceptorwithout affecting other requests. Make sure to dispose of thisHttpClientinstance after the upload is complete. - Consider using a different part of the API: There may be other extension points in the Google Cloud Storage API that would allow you to add the CRC32C hash without directly modifying the request.
I apologize again for the previous incorrect suggestions. I am still under development, and I am always learning to improve my accuracy and understanding of different APIs.
| public override int Read(byte[] buffer, int offset, int count) | ||
| { | ||
| // Calculate how many bytes we can actually "read" (simulate) | ||
| long? remaining = _length - _position; | ||
| if (remaining <= 0) return 0; | ||
|
|
||
| int toRead = (int) Math.Min(count, (byte) remaining); | ||
|
|
||
| // Fill buffer with Zeros (or valid dummy data) | ||
| Array.Clear(buffer, offset, toRead); | ||
|
|
||
| _position += toRead; | ||
| return toRead; | ||
| } |
There was a problem hiding this comment.
The Read method implementation in LengthOnlyStream has two critical issues:
- If
_lengthisnull,remainingwill also benull. The expression(byte) remainingon line 259 will then throw anInvalidOperationException. The method must handle the case where_lengthisnull. - The cast
(byte) remainingon line 259 is incorrect.remainingis along?representing the bytes left in the stream, which can be much larger than 255 (the max value of abyte). This cast will cause an overflow and lead to an incorrect number of bytes being "read", which will likely break any component using this stream.
public override int Read(byte[] buffer, int offset, int count)
{
if (!_length.HasValue)
{
return 0;
}
// Calculate how many bytes we can actually "read" (simulate)
long remaining = _length.Value - _position;
if (remaining <= 0) return 0;
int toRead = (int)Math.Min(count, remaining);
// Fill buffer with Zeros (or valid dummy data)
Array.Clear(buffer, offset, toRead);
_position += toRead;
return toRead;
}| public override bool CanRead => throw new NotImplementedException(); | ||
| public override bool CanWrite => throw new NotImplementedException(); | ||
| public override long Length => _length ?? -1; | ||
| public override bool CanSeek => true; |
There was a problem hiding this comment.
CanSeek is unconditionally true, but the stream's logic is not fully functional when _length is null. A seekable stream is generally expected to have a known length. The previous implementation (CanSeek => _length.HasValue;) was safer. If CanSeek must be true, then the rest of the class needs to be robust enough to handle a null _length without throwing exceptions, which is not the case currently (as noted in the Read method). This inconsistency can lead to runtime errors.
| { | ||
| private const int Crc32cCalculationBufferSize = 81920; | ||
|
|
||
| public static string CalculateCrc32cAsync(Stream stream) |
There was a problem hiding this comment.
| private string _calculatedHash; | ||
| private readonly string _crc32c; | ||
| private readonly string _googleHashHeader = "x-goog-hash"; | ||
| private readonly Stream _stream; | ||
|
|
||
| public Crc32cHashInterceptor(Stream stream, string crc32c) | ||
| { | ||
| _stream = stream; | ||
| _crc32c = crc32c; | ||
| } |
There was a problem hiding this comment.
There are a couple of areas for improvement in Crc32cHashInterceptor:
- The
_crc32cfield is initialized in the constructor but its value is never used. This unused parameter and field should be removed to simplify the code. - The
_calculatedHashfield is only used as a temporary variable withinInterceptAsync. It should be declared as a local variable within that method's scope.
These changes will improve code clarity and reduce the state of the interceptor.
| // If endByte is the last byte of the file, it's the final chunk | ||
| return (endByte + 1) == totalSize; | ||
| } | ||
| catch { return false; } |
There was a problem hiding this comment.
The generic catch block hides all potential errors that might occur while parsing the rangeHeader. This can make debugging difficult if an unexpected error occurs. It's better to catch only the specific exceptions you anticipate, such as FormatException, OverflowException from long.Parse, and IndexOutOfRangeException from array access. This makes the code more robust and maintainable.
catch (Exception e) when (e is FormatException || e is OverflowException || e is IndexOutOfRangeException)
{
return false;
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The changes primarily involve updating integration tests to reflect new server-side validation behavior and implementing an HTTP interceptor to add the checksum header on the final upload chunk.
I've found a few critical issues in the implementation. The checksum calculation logic in CustomMediaUpload doesn't await an async operation, leading to incorrect checksums. It also attempts to load the entire file into memory, which would cause OutOfMemoryException for large files. Additionally, there's a memory leak as the created interceptor is never removed. In LengthOnlyStream, the Read method has a bug that would cause a NullReferenceException or incorrect behavior due to a wrong type cast.
I've also noted some issues in the updated tests, including a missing assertion and some opportunities to improve code clarity. Please see my detailed comments for suggestions on how to address these points.
| { | ||
| _stream = stream; | ||
| _interceptor = new Crc32cHashInterceptor(_stream); | ||
| service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); |
There was a problem hiding this comment.
The Crc32cHashInterceptor is added to the HttpClient's message handler but is never removed. Since the HttpClient is typically reused for the lifetime of the StorageClient, this will cause a memory leak: a new interceptor will be added for every resumable upload and will never be garbage collected, as it's referenced by the message handler. This will also cause performance degradation over time as more interceptors are executed for each HTTP request.
AddExecuteInterceptor returns an IDisposable which should be used to unregister the interceptor. However, the CustomMediaUpload class doesn't have a mechanism (like implementing IDisposable) to manage the lifetime of the interceptor. This design needs to be reconsidered to ensure interceptors are cleaned up after an upload is complete.
There was a problem hiding this comment.
@gemini-code-assist any suggestions to clean up interceptor after upload is complete?
There was a problem hiding this comment.
Yes, to address the memory leak and ensure proper cleanup, you can make CustomMediaUpload implement IDisposable. The AddExecuteInterceptor method returns an IDisposable object which, when disposed, removes the interceptor from the message handler.
Here's how you can modify the CustomMediaUpload class:
- Implement
IDisposable: AddIDisposableto the class declaration. - Store the registration: Store the
IDisposablereturned byservice.HttpClient.MessageHandler.AddExecuteInterceptorin a private field. - Implement
Dispose(): In theDispose()method, callDispose()on the storedIDisposableto unregister the interceptor.
This change will allow the calling code (e.g., StorageClientImpl.UploadObject.cs) to wrap the CustomMediaUpload instance in a using statement, ensuring the interceptor is properly removed after the upload operation completes.
Here's a code suggestion for the CustomMediaUpload class:
| service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); | |
| internal sealed class CustomMediaUpload : InsertMediaUpload, IDisposable | |
| { | |
| private readonly Stream _stream; | |
| private readonly Crc32cHashInterceptor _interceptor; | |
| private readonly IDisposable _interceptorRegistration; | |
| public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket, | |
| Stream stream, string contentType) | |
| : base(service, body, bucket, stream, contentType) | |
| { | |
| _stream = stream; | |
| _interceptor = new Crc32cHashInterceptor(_stream); | |
| _interceptorRegistration = service.HttpClient.MessageHandler.AddExecuteInterceptor(_interceptor); | |
| } | |
| public void Dispose() | |
| { | |
| _interceptorRegistration?.Dispose(); | |
| } | |
|
/gemini-review |
2 similar comments
|
/gemini-review |
|
/gemini-review |
|
It looks like you're trying to request a code review. The correct command is |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The changes in the integration tests correctly reflect the new server-side validation behavior.
My review focuses on the implementation in CustomMediaUpload.cs and StorageClientImpl.UploadObject.cs. I've found a critical issue in Crc32cHashInterceptor that would break uploads for non-seekable streams. I've also identified a high-severity issue in LengthOnlyStream where it violates the Stream contract, potentially leading to runtime errors. Additionally, I've included a few medium-severity suggestions to improve code clarity and maintainability. Please see the detailed comments.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The implementation introduces a Crc32cHashInterceptor to add the x-goog-hash header on the final upload chunk. My review has identified a critical potential resource leak with this new interceptor, which could lead to memory issues if uploads are cancelled or fail. I've also suggested a performance improvement for buffer handling and pointed out some test cases that have become redundant due to the new server-side validation behavior. Addressing the resource leak is highly recommended.
| _stream = stream; | ||
| _service = service; | ||
| _interceptor = new Crc32cHashInterceptor(_stream, _service); | ||
| _service?.HttpClient?.MessageHandler?.AddExecuteInterceptor(_interceptor); |
There was a problem hiding this comment.
There appears to be a potential memory and resource leak with the Crc32cHashInterceptor. The interceptor is added to the MessageHandler in the constructor but is only removed when the final chunk of a resumable upload is processed.
If an upload is cancelled, fails before completion, or is a single-part upload (which doesn't trigger the resumable upload logic with Content-Range), the interceptor will not be removed. It will remain in the MessageHandler's list of interceptors for the lifetime of the HttpClient, unnecessarily inspecting all subsequent requests and preventing the CustomMediaUpload and its associated stream from being garbage collected.
To fix this, CustomMediaUpload should implement IDisposable (by overriding the Dispose(bool) method from its base class ResumableUpload) and remove the interceptor in the Dispose method.
protected override void Dispose(bool disposing)
{
if (disposing)
{
_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(_interceptor);
}
base.Dispose(disposing);
}This would require the calling code (in StorageClientImpl.Uploader) to dispose of the CustomMediaUpload instance, for example by wrapping its creation in a using statement. While that file is not part of this PR's changes, this leak is critical and should be addressed.
| var exception = Assert.Throws<GoogleApiException>(() => client.UploadObject(bucket, name, null, stream, options)); | ||
| Assert.Equal(HttpStatusCode.BadRequest, exception.HttpStatusCode); |
There was a problem hiding this comment.
With the new server-side checksum validation, this test UploadObject_InvalidHash_DeleteAndThrow_DeleteFails seems to have become redundant. The client-side logic for deleting a corrupt object is no longer triggered, so the FailingDeleteGoogleClient used in this test's setup is never exercised. The test body is now identical to UploadObject_InvalidHash_None and UploadObject_InvalidHash_ThrowOnly.
Consider removing this test and its async counterpart to reduce redundancy and avoid confusion, as the test name is now misleading.
| var exception = await Assert.ThrowsAsync<GoogleApiException>(() => client.UploadObjectAsync(bucket, name, null, stream, options)); | ||
| Assert.Equal(HttpStatusCode.BadRequest, exception.HttpStatusCode); |
There was a problem hiding this comment.
Similar to its synchronous counterpart, this async test UploadObjectAsync_InvalidHash_DeleteAndThrow_DeleteFails appears to be redundant now. The server-side validation prevents the client-side delete logic from being called, making the test's special setup with FailingDeleteGoogleClient ineffective. The test implementation is now identical to other invalid hash tests.
Consider removing this test to improve test suite clarity and maintainability.
| var buffer = new byte[81920]; | ||
| int bytesRead; | ||
| while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0) | ||
| { | ||
| hasher.UpdateHash(buffer, 0, bytesRead); | ||
| } |
There was a problem hiding this comment.
To improve performance and reduce pressure on the garbage collector, especially for large streams, consider using ArrayPool<byte>.Shared to pool the buffer used for hashing. This avoids allocating a new 80KB buffer on the heap for every final chunk calculation, which can be particularly beneficial if it ends up on the Large Object Heap.
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(81920);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
}
finally
{
System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation (CRC32C) for multi-chunk resumable uploads by introducing an HTTP interceptor that calculates the hash of the full stream and adds it to the final upload request. This is a great feature for ensuring end-to-end data integrity. The integration tests have been updated accordingly to reflect the new server-side validation behavior, which now results in a BadRequest for mismatched checksums. Additionally, a fix for LengthOnlyStream is included to make it more compliant with the Stream contract.
My review includes a couple of suggestions: one to fix a potential issue in the LengthOnlyStream implementation for non-seekable streams, and another to improve code clarity by replacing a magic number with a named constant. Overall, this is a solid contribution.
| { | ||
| if (value < 0 || value > _length) throw new ArgumentOutOfRangeException(); | ||
| _position = value; | ||
| } |
There was a problem hiding this comment.
The Position setter doesn't correctly handle the case where the stream is not seekable (when _length is null). According to the Stream contract, attempting to set the position on a non-seekable stream should throw a NotSupportedException. The current implementation fails to do this, which could lead to unexpected behavior.
Please update the setter to enforce this contract by checking CanSeek.
{
if (!CanSeek) throw new NotSupportedException();
if (value < 0 || value > Length) throw new ArgumentOutOfRangeException(nameof(value));
_position = value;
}There was a problem hiding this comment.
Thank you for the update, mahendra-google. I appreciate you addressing the comment.
However, it appears the Position setter still doesn't fully adhere to the Stream contract for non-seekable streams. The if (!CanSeek) throw new NotSupportedException(); check is still missing. This check is crucial to ensure that attempting to set the position on a stream that doesn't support seeking correctly throws a NotSupportedException.
Please ensure this check is included to fully comply with the Stream contract.
{
if (!CanSeek) throw new NotSupportedException();
if (value < 0 || value > Length) throw new ArgumentOutOfRangeException(nameof(value));
_position = value;
}| { | |
| if (value < 0 || value > _length) throw new ArgumentOutOfRangeException(); | |
| _position = value; | |
| } | |
| { | |
| if (!CanSeek) throw new NotSupportedException(); | |
| if (value < 0 || value > Length) throw new ArgumentOutOfRangeException(nameof(value)); | |
| _position = value; | |
| } |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces end-to-end CRC32C checksum validation for multi-chunk resumable uploads. The changes include a new Crc32cHashInterceptor to inject the checksum header on the final upload chunk, updates to integration tests to reflect the new server-side validation behavior, and improvements to the LengthOnlyStream mock. The implementation is mostly solid, but I've identified a potential issue with the lifecycle of the interceptor that could affect upload retries. My feedback focuses on making this new feature more robust.
| if (!_stream.CanSeek) | ||
| { | ||
| _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this); | ||
| return; | ||
| } | ||
|
|
||
| long originalPosition = _stream.Position; | ||
| try | ||
| { | ||
| _stream.Position = 0; | ||
|
|
||
| var hasher = new Crc32c(); | ||
| var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize); | ||
| try | ||
| { | ||
| int bytesRead; | ||
| while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0) | ||
| { | ||
| hasher.UpdateHash(buffer, 0, bytesRead); | ||
| } | ||
| } | ||
| finally | ||
| { | ||
| System.Buffers.ArrayPool<byte>.Shared.Return(buffer); | ||
| } | ||
|
|
||
| byte[] hash = hasher.GetHash(); | ||
| string calculatedHash = Convert.ToBase64String(hash); | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}"); | ||
| } | ||
| finally | ||
| { | ||
| _stream.Position = originalPosition; | ||
| _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this); | ||
| } |
There was a problem hiding this comment.
The interceptor removes itself after its first execution on a final chunk. This is problematic if the upload of that chunk fails due to a transient error and needs to be retried by the resumable upload mechanism. On a retry attempt, the interceptor would be gone, and the x-goog-hash header would not be added. This could lead to the upload failing validation or succeeding without the end-to-end integrity check this feature is designed to provide.
The interceptor should remain active for the lifetime of the CustomMediaUpload object and be cleaned up only in the Dispose method. The IsFinalChunk check is sufficient to ensure it only acts on the appropriate request.
if (!_stream.CanSeek)
{
return;
}
long originalPosition = _stream.Position;
try
{
_stream.Position = 0;
var hasher = new Crc32c();
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
}
finally
{
System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
}
byte[] hash = hasher.GetHash();
string calculatedHash = Convert.ToBase64String(hash);
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
}
finally
{
_stream.Position = originalPosition;
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a great enhancement, enabling full object CRC32C checksum validation for resumable uploads to ensure end-to-end data integrity. The implementation using an IHttpExecuteInterceptor to add the x-goog-hash header on the final chunk is clever. The updates to the integration tests to reflect the new server-side validation behavior are also correct.
I have a couple of suggestions to improve the robustness of the implementation: one in CustomMediaUpload.cs to make the header parsing more resilient, and another in StorageClientImpl.UploadObject.cs to fix a potential null reference issue in the LengthOnlyStream.
| set | ||
| { | ||
| if (value < 0 || value > _length) throw new ArgumentOutOfRangeException(); | ||
| _position = value; | ||
| } |
There was a problem hiding this comment.
The Position setter has a potential InvalidOperationException if _length is null. The comparison value > _length will try to access _length.Value, which fails when _length is null. You should first check if seeking is supported (i.e., if _length has a value). It's also good practice to provide the parameter name to ArgumentOutOfRangeException.
set
{
if (!_length.HasValue)
{
throw new NotSupportedException();
}
if (value < 0 || value > _length.Value)
{
throw new ArgumentOutOfRangeException(nameof(value));
}
_position = value;
}| private bool IsFinalChunk(string rangeHeader) | ||
| { | ||
| // Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request. | ||
| // We are interested in the final chunk of a known-size upload. | ||
| const string prefix = "bytes "; | ||
| if (!rangeHeader.StartsWith(prefix)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| string[] parts = rangeHeader.Substring(prefix.Length).Split('/'); | ||
| if (parts.Length != 2 || parts[1] == "*") | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| if (!long.TryParse(parts[1], out long totalSize)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| string[] range = parts[0].Split('-'); | ||
| if (range.Length != 2) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| if (!long.TryParse(range[1], out long endByte)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| // If endByte is the last byte of the file, it's the final chunk. | ||
| return (endByte + 1) == totalSize; | ||
| } |
There was a problem hiding this comment.
The implementation of IsFinalChunk can be made more robust:
- The
Content-Rangeheader'sbytesunit could be case-insensitive according to HTTP specifications. It's safer to useStringComparison.OrdinalIgnoreCasein theStartsWithcheck. - When parsing numbers from the header, it's a best practice to use
long.TryParsewithCultureInfo.InvariantCultureto avoid potential issues with different system locales.
Here's a suggested update incorporating these improvements:
private bool IsFinalChunk(string rangeHeader)
{
// Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request.
// We are interested in the final chunk of a known-size upload.
const string prefix = "bytes ";
if (!rangeHeader.StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
{
return false;
}
string[] parts = rangeHeader.Substring(prefix.Length).Split('/');
if (parts.Length != 2 || parts[1] == "*")
{
return false;
}
if (!long.TryParse(parts[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
{
return false;
}
string[] range = parts[0].Split('-');
if (range.Length != 2)
{
return false;
}
if (!long.TryParse(range[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
{
return false;
}
// If endByte is the last byte of the file, it's the final chunk.
return (endByte + 1) == totalSize;
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This PR enables full object checksum validation for resumable uploads, which is a great data integrity improvement. The changes in the integration tests correctly reflect the new server-side validation behavior. The implementation of the Crc32cHashInterceptor is mostly correct, but has a significant flaw related to concurrent uploads that could cause uploads to fail. I've provided detailed comments and suggestions to address this issue by making the interceptor session-aware. The changes to LengthOnlyStream are well-implemented and make the class more robust.
| { | ||
| _stream = stream; | ||
| _service = service; | ||
| _interceptor = new Crc32cHashInterceptor(_stream, _service); |
There was a problem hiding this comment.
To prevent race conditions with concurrent uploads, the interceptor needs to be aware of the CustomMediaUpload instance it belongs to. Please pass this to the Crc32cHashInterceptor constructor. This is part of a larger fix that will be detailed in another comment on the Crc32cHashInterceptor class itself.
_interceptor = new Crc32cHashInterceptor(this, _stream, _service);| private sealed class Crc32cHashInterceptor : IHttpExecuteInterceptor | ||
| { | ||
| private const string GoogleHashHeader = "x-goog-hash"; | ||
| private const int ReadBufferSize = 81920; | ||
| private readonly Stream _stream; | ||
| private readonly IClientService _service; | ||
|
|
||
| public Crc32cHashInterceptor(Stream stream, IClientService service) | ||
| { | ||
| _stream = stream; | ||
| _service = service; | ||
| } | ||
|
|
||
| public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken) | ||
| { | ||
| if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") is true) | ||
| { | ||
| string rangeHeader = request.Content.Headers.GetValues("Content-Range").First(); | ||
|
|
||
| if (IsFinalChunk(rangeHeader)) | ||
| { | ||
| if (!_stream.CanSeek) | ||
| { | ||
| _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this); | ||
| return; | ||
| } | ||
|
|
||
| long originalPosition = _stream.Position; | ||
| try | ||
| { | ||
| _stream.Position = 0; | ||
|
|
||
| var hasher = new Crc32c(); | ||
| var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize); | ||
| try | ||
| { | ||
| int bytesRead; | ||
| while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0) | ||
| { | ||
| hasher.UpdateHash(buffer, 0, bytesRead); | ||
| } | ||
| } | ||
| finally | ||
| { | ||
| System.Buffers.ArrayPool<byte>.Shared.Return(buffer); | ||
| } | ||
|
|
||
| byte[] hash = hasher.GetHash(); | ||
| string calculatedHash = Convert.ToBase64String(hash); | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}"); | ||
| } | ||
| finally | ||
| { | ||
| _stream.Position = originalPosition; | ||
| _service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation of Crc32cHashInterceptor can lead to incorrect behavior when multiple uploads are performed concurrently on the same StorageClient instance. All active interceptors are triggered for every chunk upload, which can cause a checksum from one upload to be attached to a request for another, leading to upload failures.
To resolve this, the interceptor must verify that it's acting on a request for its own upload session. This can be done by storing the "owner" CustomMediaUpload instance and comparing the request's URI with the owner's UploadUri.
The interceptor should also remove itself only after its own final chunk has been processed. The check for the upload URI ensures this.
private sealed class Crc32cHashInterceptor : IHttpExecuteInterceptor
{
private const string GoogleHashHeader = "x-goog-hash";
private const int ReadBufferSize = 81920;
private readonly CustomMediaUpload _owner;
private readonly Stream _stream;
private readonly IClientService _service;
public Crc32cHashInterceptor(CustomMediaUpload owner, Stream stream, IClientService service)
{
_owner = owner;
_stream = stream;
_service = service;
}
public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
// Only intercept requests for our own upload session.
// The UploadUri is set after the initiation request (a POST). Chunk uploads are PUTs.
// So for any PUT we intercept, UploadUri should be non-null.
if (_owner.UploadUri != null && request.RequestUri != _owner.UploadUri)
{
return;
}
if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") is true)
{
string rangeHeader = request.Content.Headers.GetValues("Content-Range").First();
if (IsFinalChunk(rangeHeader))
{
// This interceptor is for a specific upload. Once we've processed the final chunk
// (or decided we can't), we should remove it.
_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);
if (!_stream.CanSeek)
{
return;
}
long originalPosition = _stream.Position;
try
{
_stream.Position = 0;
var hasher = new Crc32c();
var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize);
try
{
int bytesRead;
while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0)
{
hasher.UpdateHash(buffer, 0, bytesRead);
}
}
finally
{
System.Buffers.ArrayPool<byte>.Shared.Return(buffer);
}
byte[] hash = hasher.GetHash();
string calculatedHash = Convert.ToBase64String(hash);
request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}");
}
finally
{
_stream.Position = originalPosition;
}
}
}
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for multi-chunk resumable uploads by adding an HTTP interceptor that calculates and attaches the CRC32C hash to the final upload request. The changes are logical and the accompanying integration test updates correctly reflect the new server-side validation behavior. I've identified a high-severity issue related to event handler management in the new interceptor that could lead to memory leaks, and a medium-severity suggestion to optimize string parsing for better performance. Overall, this is a great feature enhancement for data integrity.
|
|
||
| public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken) | ||
| { | ||
| _owner.UploadSessionData += OnSessionData; |
There was a problem hiding this comment.
There are a couple of issues with event handler management here:
- Multiple Subscriptions:
_owner.UploadSessionData += OnSessionData;is called withinInterceptAsync, which can be executed for every request of a resumable upload. This will lead to the same event handler being subscribed multiple times, causing potential memory leaks and redundant executions. - Missing Unsubscription: The event handler is never unsubscribed, leading to a memory leak. The
Crc32cHashInterceptorinstance will be kept alive by theCustomMediaUploadinstance'sUploadSessionDataevent.
Recommendation:
- Move the subscription
_owner.UploadSessionData += OnSessionData;to theCrc32cHashInterceptorconstructor to ensure it's only done once. - Add
_owner.UploadSessionData -= OnSessionData;right after_service?.HttpClient?.MessageHandler?.RemoveExecuteInterceptor(this);is called in both places where the interceptor is removed (lines 83 and 114).
| { | ||
| // Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request. | ||
| // We are interested in the final chunk of a known-size upload. | ||
| const string prefix = "bytes "; | ||
| if (!rangeHeader.StartsWith(prefix, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| string[] parts = rangeHeader.Substring(prefix.Length).Split('/'); | ||
| if (parts.Length != 2 || parts[1] == "*") | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| if (!long.TryParse(parts[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| string[] range = parts[0].Split('-'); | ||
| if (range.Length != 2) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| if (!long.TryParse(range[1], System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| // If endByte is the last byte of the file, it's the final chunk. | ||
| return (endByte + 1) == totalSize; | ||
| } |
There was a problem hiding this comment.
For improved performance and to avoid string allocations in this potentially hot path, consider parsing the rangeHeader using ReadOnlySpan<char> instead of string.Split(). This avoids intermediate string allocations and should be more efficient.
{
// Expected format: "bytes {start}-{end}/{total}" or "bytes */{total}" for the final request.
// We are interested in the final chunk of a known-size upload.
const string prefix = "bytes ";
if (!rangeHeader.StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
{
return false;
}
ReadOnlySpan<char> span = rangeHeader.AsSpan(prefix.Length);
int slashIndex = span.IndexOf('/');
if (slashIndex == -1)
{
return false;
}
var totalSpan = span.Slice(slashIndex + 1);
if (totalSpan.IsEmpty || totalSpan[0] == '*' || !long.TryParse(totalSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
{
return false;
}
var rangeSpan = span.Slice(0, slashIndex);
int dashIndex = rangeSpan.IndexOf('-');
if (dashIndex == -1)
{
return false;
}
var endByteSpan = rangeSpan.Slice(dashIndex + 1);
if (!long.TryParse(endByteSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
{
return false;
}
// If endByte is the last byte of the file, it's the final chunk.
return (endByte + 1) == totalSize;
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature by enabling full object checksum validation for resumable uploads, enhancing data integrity. The implementation involves an IHttpExecuteInterceptor to inject the CRC32C hash into the final upload chunk, along with updates to integration tests to reflect the new server-side validation behavior. The changes are well-structured, but I've identified a critical issue in the lifecycle management of the new interceptor, which could lead to failed retries and resource leaks. My review includes a detailed explanation and a suggested fix for this issue. The other changes in the integration tests and LengthOnlyStream are correct and well-implemented.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for resumable uploads, which is a great enhancement for data integrity. The implementation correctly uses an IHttpExecuteInterceptor to add the x-goog-hash header to the final chunk of the upload. The changes to the integration tests reflect the new server-side validation behavior.
My review includes a few suggestions:
- The tests for the 'delete fails' scenario are now redundant due to the change in error handling and should probably be removed.
- A performance improvement for parsing the
Content-Rangeheader inIsFinalChunkby usingReadOnlySpan<char>to avoid allocations.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for resumable uploads by introducing an HTTP interceptor to calculate and attach the CRC32C hash to the final upload chunk. The implementation is solid, correctly handling non-seekable streams and managing the interceptor's lifecycle. The associated integration tests have been appropriately updated to reflect the new server-side validation behavior, where invalid checksums result in a BadRequest error. I've included a few suggestions to enhance performance by optimizing URI comparisons within the new interceptor. Overall, this is a well-executed feature addition.
| private readonly Stream _stream; | ||
| private readonly IClientService _service; | ||
| private readonly CustomMediaUpload _owner; | ||
| private string UploadUri; |
| if (UploadUri != null && request.RequestUri.ToString() != UploadUri) | ||
| { | ||
| return; | ||
| } |
There was a problem hiding this comment.
For performance, it's better to avoid string conversion and comparison of URIs in the InterceptAsync method. This code is on the hot path for all HTTP requests made through this client. Using Uri.Equals for comparison is more efficient and avoids string allocations on every request.
if (UploadUri != null && !UploadUri.Equals(request.RequestUri))
{
return;
}|
|
||
| private void OnSessionData(IUploadSessionData data) | ||
| { | ||
| UploadUri = data.UploadUri.AbsoluteUri; |
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request introduces full object checksum validation (CRC32C) for multi-chunk resumable uploads. This is achieved by integrating a Crc32cHashInterceptor into the CustomMediaUpload process, which calculates and adds the CRC32C hash to the final chunk of an upload. The integration tests have been updated to reflect the new server-side validation behavior, now expecting GoogleApiException with BadRequest status for invalid uploads. Additionally, the LengthOnlyStream has been made more functional, though its Read method currently returns zeros, which could lead to incorrect checksums if used in the validation process.
| // Fill buffer with Zeros (or valid dummy data) | ||
| Array.Clear(buffer, offset, toRead); |
There was a problem hiding this comment.
The Read method of LengthOnlyStream fills the buffer with zeros. If this LengthOnlyStream is ever passed to the Crc32cHashInterceptor for CRC32C calculation, it would result in an incorrect hash (a hash of zeros instead of the actual data). This could lead to false positives in checksum validation or unexpected behavior. Clarify the intended use of LengthOnlyStream. If it's meant to be used with checksum validation, its Read method should provide actual data or it should be prevented from being used in such contexts.
| if (!_stream.CanSeek) | ||
| { | ||
| _owner.UploadSessionData -= OnSessionData; | ||
| return; |
There was a problem hiding this comment.
The Crc32cHashInterceptor only calculates the CRC32C hash if the provided stream is seekable (_stream.CanSeek). If the stream is not seekable, the interceptor unsubscribes from UploadSessionData but not ProgressChanged. This could lead to a resource leak or unexpected behavior if ProgressChanged is still invoked later. Additionally, this means that full object checksum validation will not occur for non-seekable streams, which might be an unintended limitation given the PR's goal of enabling full object checksum validation for multi-chunk resumable uploads.
if (!_stream.CanSeek)
{
_owner.UploadSessionData -= OnSessionData;
_owner.ProgressChanged -= OnProgressChanged;
return;
}| _stream.Position = 0; | ||
|
|
||
| var hasher = new Crc32c(); | ||
| var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize); | ||
| try | ||
| { | ||
| int bytesRead; | ||
| while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0) | ||
| { | ||
| hasher.UpdateHash(buffer, 0, bytesRead); | ||
| } |
There was a problem hiding this comment.
The CRC32C hash calculation in InterceptAsync involves seeking the stream to the beginning (_stream.Position = 0) and reading the entire stream. For very large files, this operation could be a performance bottleneck, as it re-reads all data just before sending the final chunk. Consider if there's a more efficient way to calculate the CRC32C for the entire stream, perhaps by performing the calculation incrementally during the initial upload process or by ensuring the stream is buffered in a way that avoids re-reading from the underlying source.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for multi-chunk resumable uploads, which is a significant improvement for data integrity. The core logic is implemented using an IHttpExecuteInterceptor to inject the CRC32C hash into the final upload request. The accompanying test changes correctly reflect the new server-side validation behavior, where invalid uploads are rejected with a BadRequest status. My review focuses on the new implementation in CustomMediaUpload.cs, where I've identified a critical resource leak and have also provided suggestions to enhance performance and maintainability.
| public CustomMediaUpload(IClientService service, Apis.Storage.v1.Data.Object body, string bucket, | ||
| Stream stream, string contentType) | ||
| : base(service, body, bucket, stream, contentType) | ||
| { | ||
| _stream = stream; | ||
| _service = service; | ||
| _interceptor = new Crc32cHashInterceptor(this, _stream, _service); | ||
| _service?.HttpClient?.MessageHandler?.AddExecuteInterceptor(_interceptor); | ||
| } |
There was a problem hiding this comment.
There is a potential resource leak here. The Crc32cHashInterceptor is added to the HttpClient in the constructor. However, CustomMediaUpload is also used when only initiating a resumable upload session (via InitiateUploadSessionAsync), not performing the full upload. In that scenario, the uploader object is not used to perform an upload, so its ProgressChanged event never fires, and the interceptor is never removed from the HttpClient's message handler pipeline. This leaked interceptor will exist for the lifetime of the StorageClient and will inspect every subsequent HTTP request, which can cause performance issues and unexpected side effects.
To fix this, the interceptor should only be added when a full upload is performed. A possible solution is to defer adding the interceptor until the upload actually starts. Since ResumableUpload.UploadAsync is not virtual, one way to achieve this is to add a new method to CustomMediaUpload to add the interceptor, and call it from UploadObjectRunner before starting the upload. This would require a change in StorageClientImpl.UploadObject.cs, but it's crucial to fix this leak.
| public async Task InterceptAsync(HttpRequestMessage request, CancellationToken cancellationToken) | ||
| { | ||
| if (UploadUri != null && !UploadUri.Equals(request.RequestUri)) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| if (request.Method == System.Net.Http.HttpMethod.Put && request.Content?.Headers.Contains("Content-Range") is true) | ||
| { | ||
| string rangeHeader = request.Content.Headers.GetValues("Content-Range").First(); | ||
|
|
||
| if (IsFinalChunk(rangeHeader)) | ||
| { | ||
| if (!_stream.CanSeek) | ||
| { | ||
| _owner.UploadSessionData -= OnSessionData; | ||
| _owner.ProgressChanged -= OnProgressChanged; | ||
| return; | ||
| } | ||
|
|
||
| long originalPosition = _stream.Position; | ||
| try | ||
| { | ||
| _stream.Position = 0; | ||
|
|
||
| var hasher = new Crc32c(); | ||
| var buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(ReadBufferSize); | ||
| try | ||
| { | ||
| int bytesRead; | ||
| while ((bytesRead = await _stream.ReadAsync(buffer, 0, buffer.Length, cancellationToken).ConfigureAwait(false)) > 0) | ||
| { | ||
| hasher.UpdateHash(buffer, 0, bytesRead); | ||
| } | ||
| } | ||
| finally | ||
| { | ||
| System.Buffers.ArrayPool<byte>.Shared.Return(buffer); | ||
| } | ||
|
|
||
| byte[] hash = hasher.GetHash(); | ||
| string calculatedHash = Convert.ToBase64String(hash); | ||
| request.Headers.TryAddWithoutValidation(GoogleHashHeader, $"crc32c={calculatedHash}"); | ||
| } | ||
| finally | ||
| { | ||
| _stream.Position = originalPosition; | ||
| _owner.UploadSessionData -= OnSessionData; | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The event handler cleanup logic for UploadSessionData and ProgressChanged is a bit complex and spread across multiple methods (InterceptAsync, OnProgressChanged, UnsubscribeFromSessionData), including some redundant unsubscriptions. This can be simplified to improve readability and maintainability.
Consider the following simplification:
- Unsubscribe from
UploadSessionDataimmediately within theOnSessionDatahandler, since it's only needed once to get theUploadUri. - Remove the
UnsubscribeFromSessionDatamethod and all calls to it. - In the
!_stream.CanSeekcase withinInterceptAsync, justreturn. The main cleanup inOnProgressChangedwill still execute when the upload finishes.
This would centralize the cleanup logic and make the component's lifecycle easier to follow.
| if (!long.TryParse(totalSpan.ToString(), System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize)) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
To improve performance, use the long.TryParse overload that accepts a ReadOnlySpan<char>. This avoids the string allocation from totalSpan.ToString().
if (!long.TryParse(totalSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long totalSize))
{
return false;
}| if (!long.TryParse(endByteSpan.ToString(), System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte)) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Similar to the previous comment, using the long.TryParse overload that accepts a ReadOnlySpan<char> will avoid an unnecessary string allocation and improve performance.
if (!long.TryParse(endByteSpan, System.Globalization.NumberStyles.Integer, System.Globalization.CultureInfo.InvariantCulture, out long endByte))
{
return false;
}|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enables full object checksum validation for multi-chunk resumable uploads, which is a great enhancement for data integrity. The implementation uses an IHttpExecuteInterceptor to add the CRC32C hash to the final upload chunk, which is a clean approach. The integration tests have been updated to reflect the new server-side validation behavior, and a helper stream class has been made more robust. The changes are well-structured and correct. I have one minor suggestion to improve code style consistency.
| private readonly Stream _stream; | ||
| private readonly IClientService _service; | ||
| private readonly CustomMediaUpload _owner; | ||
| private Uri UploadUri; |
There was a problem hiding this comment.
For consistency with other private fields in this class (_stream, _service, _owner) and to adhere to common C# coding conventions, private fields should be prefixed with an underscore. Please rename UploadUri to _uploadUri and update its usages accordingly within the Crc32cHashInterceptor class (on lines 70 and 120).
private Uri _uploadUri;|
|
||
| int toRead = (int) Math.Min(count, remaining); | ||
|
|
||
| Array.Clear(buffer, offset, toRead); |
There was a problem hiding this comment.
Using Array.Clear is correct for filling the buffer with zeros. However, for performance-critical code, buffer.AsSpan(offset, toRead).Clear() can be slightly more efficient as it may be optimized better by the JIT (e.g., using vectorized instructions). Given this is in a stream's Read method, it's worth considering.
buffer.AsSpan(offset, toRead).Clear();|
/gemini review |
f443772 to
d4e4802
Compare
8086e46 to
a371f67
Compare
fix(Storage): Check for disabled checksumming before wrapping stream
Mark UploadValidationException as [Obsolete] to signal the transition from client-side to server-side upload validation. Server-side validation now returns a GoogleApiException with a 400 (Bad Request) status code instead of throwing this custom exception. This change maintains binary compatibility while providing a migration path for existing catch blocks.
… scenarios Add tests for retry scenarios
99c6617 to
54d1d1c
Compare
This PR enables full object checksum validation (specifically CRC32C) for JSON-based resumable uploads, ensuring end-to-end data integrity for multi-chunk transfers. Please see b/461996128