feat(portal): stream file uploads to reduce memory from O(file_size) to O(5 MiB) by lilongen · Pull Request #409 · boxlite-ai/boxlite

lilongen · 2026-03-27T05:53:58Z

Summary

Replace full-file Vec<UploadChunk> buffering in upload_tar with a bounded mpsc::channel(4) + spawned reader task, capping peak memory at ~5 MiB regardless of file size
Extract stream_file_chunks helper (accepts impl AsyncRead) for testability and single-responsibility
Use std::mem::take for dest_path/container_id (first chunk only, zero-copy on subsequent chunks)
Always await reader JoinHandle before checking gRPC result to surface root-cause errors first
Return error on receiver drop instead of silent Ok(()) to detect incomplete uploads

Motivation

upload_tar previously read the entire tar file into a Vec<UploadChunk> before streaming to the guest. For a 500 MB file, this consumed ~500 MB of heap. The download_tar path in the same file already used streaming correctly — this PR closes the asymmetry.

Scenario	Before	After
500 MB upload	~500 MB heap	~5 MiB
1 GB upload	~1 GB heap	~5 MiB

Test plan

cargo fmt -- --check clean
cargo clippy -p boxlite --no-default-features --lib -- -D warnings clean (0 warnings)
cargo test -p boxlite --no-default-features --lib — 608 passed, 0 failed
8 new unit tests covering all branches of stream_file_chunks:
- Multi-chunk (3 MiB), single-chunk (500 B), partial last chunk, empty file
- Receiver dropped (cancellation) returns error
- I/O read error propagation via FailingReader mock
- Byte-level data integrity verification
- upload_tar file-not-found error path

🤖 Generated with Claude Code

…to O(5 MiB) Replace full-file Vec buffering in upload_tar with a bounded mpsc channel (capacity=4) and a spawned reader task. This caps peak memory at ~5 MiB regardless of file size, matching the streaming pattern already used in download_tar and the guest-side upload handler. Key changes: - Extract stream_file_chunks helper accepting impl AsyncRead for testability - Use std::mem::take for dest_path/container_id (first chunk only, zero-copy) - Always await reader JoinHandle before checking gRPC result (root-cause priority) - Return error on receiver drop instead of silent Ok (incomplete upload detection) - Add 8 unit tests: multi-chunk, single-chunk, partial, empty, receiver-drop, read-error (FailingReader mock), data-integrity, and file-not-found Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates the portal-side tar upload path to stream file contents instead of buffering the entire archive in memory, bringing upload behavior in line with the already-streaming download path and reducing peak heap usage for large uploads.

Changes:

Reworked FilesInterface::upload_tar to stream UploadChunks via a bounded mpsc channel + ReceiverStream rather than prebuilding a Vec<UploadChunk>.
Added stream_file_chunks helper to read any AsyncRead in CHUNK_SIZE increments and emit UploadChunks.
Added unit tests covering chunking behavior, integrity, cancellation/receiver-drop behavior, and error propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T05:57:59Z

boxlite/src/portal/interfaces/files.rs

+        let upload_result = self.client.upload(stream).await;
+
+        // Always join the reader task first — its error is the root cause when
+        // both the reader and the gRPC call fail (e.g. read error causes the
+        // stream to end, which causes the server to report failure).
+        match reader_handle.await {
+            Ok(Err(read_err)) => return Err(read_err),
+            Err(join_err) => {
+                return Err(BoxliteError::Internal(format!(
+                    "Upload reader task failed: {}",
+                    join_err
+                )));
            }
+            Ok(Ok(())) => {}
        }

-        // Use futures::stream::iter for the upload stream
-        let stream = futures::stream::iter(chunks);
-
-        let response = self
-            .client
-            .upload(stream)
-            .await
-            .map_err(map_tonic_err)?
-            .into_inner();
+        let response = upload_result.map_err(map_tonic_err)?.into_inner();


upload_tar returns the reader task error before inspecting the gRPC result. If the server rejects the upload early (e.g. permission denied) and drops the receiver, stream_file_chunks will likely return Internal("Upload stream closed..."), which then masks the real tonic Status from upload_result. Consider combining both results: if upload_result is Err, prefer that error unless the reader error is a real I/O failure (e.g. Storage("Failed to read...")), or include both errors in the returned message.

Copilot · 2026-03-27T05:57:59Z

boxlite/src/portal/interfaces/files.rs

+        let upload_result = self.client.upload(stream).await;
+
+        // Always join the reader task first — its error is the root cause when
+        // both the reader and the gRPC call fail (e.g. read error causes the
+        // stream to end, which causes the server to report failure).
+        match reader_handle.await {
+            Ok(Err(read_err)) => return Err(read_err),
+            Err(join_err) => {
+                return Err(BoxliteError::Internal(format!(
+                    "Upload reader task failed: {}",
+                    join_err
+                )));
            }
+            Ok(Ok(())) => {}
        }


After the gRPC call completes, upload_tar always awaits reader_handle. If the upload fails quickly (connection error) but the file read blocks indefinitely (e.g. stalled filesystem), this can hang the whole operation and hide the gRPC failure. Consider aborting the reader task (or using a timeout) when upload_result is Err, while still joining normally on the success path to surface read errors.

Copilot AI review requested due to automatic review settings March 27, 2026 05:53

Copilot started reviewing on behalf of lilongen March 27, 2026 05:54 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(portal): stream file uploads to reduce memory from O(file_size) to O(5 MiB)#409

feat(portal): stream file uploads to reduce memory from O(file_size) to O(5 MiB)#409
lilongen wants to merge 1 commit intoboxlite-ai:mainfrom
lilongen:feat/streaming-file-upload

lilongen commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lilongen commented Mar 27, 2026

Summary

Motivation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants