Skip to content

Conversation

@crrow
Copy link
Owner

@crrow crrow commented Jan 27, 2026

Summary

Add a production-ready HTTP downloader library with comprehensive features for reliable file downloads.

Features

  • Chunked parallel downloads - Splits large files into chunks and downloads them concurrently using HTTP range requests
  • Automatic resume - Persists download state to disk and automatically resumes interrupted downloads from the last completed chunk
  • SHA256-based caching - Validates downloaded content against server-provided checksums and caches files to avoid re-downloading
  • File locking - Prevents concurrent downloads of the same URL using file-based locks
  • Configurable retry logic - Built-in exponential backoff retry mechanism for transient failures
  • Comprehensive error handling - Detailed error types with context for debugging

Implementation

  • Uses reqwest for HTTP client with streaming support
  • fd-lock for cross-platform file locking
  • backon for retry strategies with exponential backoff
  • State persistence with serde_json for resume capability
  • Integration tests using axum-test and tempfile

Test Coverage

✅ Single download with cache hit validation
✅ Parallel chunked downloads with range support
✅ Download locking for concurrent requests
✅ Resume after interruption with partial chunk state

All tests pass with no warnings.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added a full HTTP downloader: parallel downloads, resume support, caching, SHA-256 integrity verification, and configurable chunking strategies for varied file sizes.
  • Tests
    • Added integration tests validating single/parallel downloads, cache hits, resume behavior, failure/concurrency handling, and end-to-end integrity checks.
  • Documentation
    • Added a comprehensive design document describing architecture, phases, operational details, and testing plan.

✏️ Tip: You can customize this high-level summary in your review settings.

crrow and others added 2 commits January 26, 2026 15:32
Add comprehensive design document for high-performance HTTP downloader
with multi-threaded chunked downloads, resume capability, and caching.

Key features:
- Smart chunking based on file size (16MB/128MB thresholds)
- Multi-threaded download with standard threads (not async)
- Chunk-level resume after interruption
- URL + SHA256 based caching
- Streaming merge with SHA256 verification

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ching

Implement a production-ready HTTP downloader library with the following features:
- Chunked parallel downloads with range request support
- Automatic resume from interruption with state persistence
- SHA256-based content caching to avoid re-downloading
- File locking to prevent concurrent downloads of same URL
- Configurable retry logic with exponential backoff
- Integration tests using axum-test and tempfile

Test coverage includes:
- Single download with cache hit validation
- Parallel chunked downloads with range support
- Download locking for concurrent requests
- Resume after interruption with partial chunk state

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Adds a new workspace crate crates/common/utils/downloader providing an HTTP downloader with caching, resumable range-based chunked downloads, parallel orchestration, state persistence, integrity verification, and integration tests.

Changes

Cohort / File(s) Summary
Workspace & Manifests
Cargo.toml, crates/common/utils/downloader/Cargo.toml
Added workspace member crates/common/utils/downloader; added workspace deps backon, dirs, jiff. New crate manifest lists runtime and dev dependencies and workspace resolution.
Library root & exports
crates/common/utils/downloader/src/lib.rs
New crate root wiring modules and re-exporting public API (configs, Downloader, DownloadError, FileInfo, state/types).
Core orchestrator
crates/common/utils/downloader/src/downloader.rs
New Downloader coordinating cache lookup, per-URL locking, resume/load, file-info fetch, strategy selection (single vs parallel), merging, caching, and cleanup.
Configuration
crates/common/utils/downloader/src/config.rs
ChunkingConfig and DownloaderConfig with defaults, builder, chunk calculation logic and unit tests.
Types & State
crates/common/utils/downloader/src/types.rs, .../state_manager.rs
Public types for requests/results, chunk/state and cache metadata; StateManager persists/resumes state, computes chunk boundaries, and manages cleanup/locks.
File info & metadata
crates/common/utils/downloader/src/file_info.rs
FileInfoFetcher performs HEAD requests, extracts content-length, range support, and optional SHA-256 from multiple headers.
Caching layer
crates/common/utils/downloader/src/cache_manager.rs
CacheManager manages per-URL cache dirs, metadata.json, SHA-256 validation, and hard-link (with copy fallback) store/restore.
Single-threaded download
crates/common/utils/downloader/src/single_downloader.rs
Single-threaded streaming downloader computing SHA-256, writing to temp file with atomic rename and temp-file cleanup guard.
Chunk download & retries
crates/common/utils/downloader/src/chunk_downloader.rs
ChunkDownloader downloads byte ranges with exponential-backoff retries (backon), streams to temp files, and handles client vs retriable errors.
Parallel orchestration
crates/common/utils/downloader/src/parallel_manager.rs
ParallelDownloadManager spawns per-chunk Tokio tasks, coordinates ChunkDownloader workers, updates shared DownloadState, and aggregates errors.
Chunk merge & verify
crates/common/utils/downloader/src/chunk_merger.rs
ChunkMerger concatenates chunk files into final output, computes SHA-256 while merging, verifies integrity, and removes temporary chunks.
Errors
crates/common/utils/downloader/src/error.rs
New DownloadError enum (SNAFU) covering network, HTTP, I/O, checksum, state, chunk, and concurrency failure modes.
Tests
crates/common/utils/downloader/tests/integration_test.rs
Integration tests using an Axum test server covering single download & cache-hit, parallel range downloads, lock behavior, and resume-after-interruption.
Design doc
docs/plans/2026-01-26-http-downloader-design.md
Added design document describing architecture, chunking, state persistence, caching, retry strategy, and phased implementation plan.

Sequence Diagrams

sequenceDiagram
    actor User
    participant Downloader as Downloader
    participant Cache as CacheManager
    participant Info as FileInfoFetcher
    participant Server as Server
    participant State as StateManager
    participant Parallel as ParallelManager
    participant Merger as ChunkMerger
    participant Single as SingleThreadDownloader

    User->>Downloader: download(request)
    Downloader->>Cache: get(url)
    alt cache hit
        Cache-->>Downloader: CachedFile
        Downloader-->>User: DownloadResult (from_cache=true)
    else cache miss
        Downloader->>Info: fetch(url)
        Info->>Server: HEAD /file
        Server-->>Info: metadata (size, checksum, ranges)
        Info-->>Downloader: FileInfo
        alt small file or no range
            Downloader->>Single: download(request)
            Single->>Server: GET /file
            Server-->>Single: full content
            Single-->>Downloader: (size, sha256)
        else parallel
            Downloader->>State: create_state(url,...)
            State-->>Downloader: DownloadState
            Downloader->>Parallel: download_all(state)
            Parallel->>Server: GET /file (Range: bytes=...)
            Server-->>Parallel: partial content
            Parallel-->>Downloader: all chunks done
            Downloader->>Merger: merge_and_verify(request, state)
            Merger-->>Downloader: (size, sha256)
            Downloader->>State: cleanup(url)
        end
        Downloader->>Cache: store(url, file_path, sha256)
        Downloader-->>User: DownloadResult (from_cache=false)
    end
Loading
sequenceDiagram
    actor User
    participant Downloader as Downloader
    participant State as StateManager
    participant Info as FileInfoFetcher
    participant Server as Server
    participant Parallel as ParallelManager
    participant Merger as ChunkMerger

    User->>Downloader: download(request) [resume attempt]
    Downloader->>State: load(url)
    State-->>Downloader: DownloadState (partial)
    Downloader->>Info: fetch(url)
    Info->>Server: HEAD /file
    Server-->>Info: metadata
    Info-->>Downloader: FileInfo
    Downloader->>State: validate(state, url, checksum, size)
    alt valid
        Downloader->>Parallel: download_all(state)
        Parallel->>Server: GET /file (Range: pending chunks)
        Server-->>Parallel: partial content
        Parallel-->>Downloader: done
        Downloader->>Merger: merge_and_verify(request, state)
        Merger-->>Downloader: (size, sha256)
        Downloader->>State: cleanup(url)
        Downloader-->>User: DownloadResult
    else invalid
        State-->>Downloader: invalid -> cleanup and restart
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I nibble bytes and stitch each part,
Chunks in a row, a downloader's art.
I cache the crumbs and mend what broke,
Resume your fetch with one quick stroke.
Hooray — the file, safe in my tote!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature being added: an HTTP downloader with resumable downloads and caching, which aligns with the substantial new downloader crate being introduced.
Description check ✅ Passed The PR description covers the main features and implementation details, though it does not follow the repository's template structure with Type of change, Testing, and Related issues sections.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@crates/common/utils/downloader/src/downloader.rs`:
- Around line 329-338: The SHA256 comparison in verify_sha256 currently compares
strings case-sensitively and can falsely fail when hex case differs; update
verify_sha256 to perform a case-insensitive comparison by normalizing both
actual and expected (e.g., to_lowercase or to_uppercase) before comparing,
keeping the Sha256MismatchSnafu error construction (expected/actual) but
consider logging or storing the original values if you need original-case
context; modify the function verify_sha256 accordingly to use the normalized
values for the equality check.

In `@crates/common/utils/downloader/src/file_info.rs`:
- Around line 74-76: The comment in file_info.rs incorrectly lists "ETag" as a
header that is tried for SHA256 extraction but the implementation doesn't parse
ETag; update the comment in the function that extracts SHA256 checksum from HTTP
headers (the extractor in file_info.rs) to remove "ETag" from the list or
otherwise clarify that ETag is not currently supported, or alternatively
implement ETag parsing (e.g., strip quotes and validate hex length) in the
extractor so the code matches the comment—pick one and make the comment and the
function (extractor in file_info.rs) consistent.
- Around line 92-106: The Digest header parsing in the block that checks
headers.get(HeaderName::from_static("digest")) only accepts 64-char hex but RFC
5843 uses Base64-encoded SHA-256; update the logic in file_info.rs (the Digest
handling branch) to also accept Base64-formatted values: when
s.strip_prefix("SHA-256=") or "sha-256=" yields a value that is not 64 hex
chars, attempt to base64-decode it (use a standard base64 decoder), verify the
decoded length is 32 bytes, then convert that byte array to a lowercase hex
string (e.g., hex::encode_lower) and return it; keep the existing hex-path
intact and return None on any decode/validation errors.

In `@crates/common/utils/downloader/src/parallel_manager.rs`:
- Around line 149-153: The Err(_) match arm currently swallows JoinHandle
failures; replace it to capture the JoinError (e.g., Err(join_err)) and handle
panics/cancellations explicitly: if join_err.is_panic() log the panic info
(Debug or downcast the panic payload) and return/propagate an Err from the
containing function so the caller knows the download failed, otherwise log the
cancellation and decide whether to retry or return an error; update the match in
the loop that awaits JoinHandle results (the arm currently commented "Task was
cancelled or panicked") to perform these checks and propagate a meaningful error
instead of silently skipping.

In `@crates/common/utils/downloader/src/state_manager.rs`:
- Around line 172-189: calculate_chunk_boundaries currently divides by
num_chunks and will panic if num_chunks == 0; add an early guard at the top of
the function (in calculate_chunk_boundaries) that returns an empty Vec when
num_chunks == 0 (and also return empty if file_size == 0) to avoid division by
zero and downstream panics—this is a defensive, non-breaking change that
prevents the panic when ChunkingConfig can yield zero chunks.
🧹 Nitpick comments (12)
docs/plans/2026-01-26-http-downloader-design.md (2)

88-93: Minor: Add language specifier to fenced code blocks.

Several code blocks in this design document lack language specifiers (e.g., lines 88, 118, 208). Adding language hints improves syntax highlighting and tooling.

For example, for the Range request calculation block:

-```
+```text
 分片 0: bytes=0-{chunk_size-1}

13-17: Note: Implementation diverged from design (async vs blocking).

The design specifies reqwest (blocking API) and std::thread for concurrency, but the actual implementation uses async reqwest with tokio. This is a reasonable evolution for better integration with the existing async codebase. Consider updating the design document to reflect the actual implementation choices for future reference.

crates/common/utils/downloader/src/cache_manager.rs (2)

127-143: Redundant buffering: BufReader is not utilized effectively.

Using BufReader with a manual buffer and read() bypasses BufReader's internal buffering. Either remove BufReader or use its buffered read capabilities.

♻️ Proposed fix - remove redundant BufReader
     async fn compute_sha256(path: &Path) -> Result<String, DownloadError> {
-        let file = fs::File::open(path).await.context(FileReadSnafu)?;
-        let mut reader = BufReader::with_capacity(512 * 1024, file); // 512KB buffer
+        let mut file = fs::File::open(path).await.context(FileReadSnafu)?;
         let mut hasher = Sha256::new();
         let mut buffer = vec![0u8; 512 * 1024]; // 512KB buffer

         loop {
-            let n = reader.read(&mut buffer).await.context(FileReadSnafu)?;
+            let n = file.read(&mut buffer).await.context(FileReadSnafu)?;
             if n == 0 {
                 break;
             }
             hasher.update(&buffer[..n]);
         }

         Ok(format!("{:x}", hasher.finalize()))
     }

179-195: Lost error context in JSON serialization/deserialization.

Using .ok().ok_or(DownloadError::StateCorrupted) discards the original serde_json error, making debugging harder when cache metadata is malformed.

♻️ Proposed fix - preserve error context

Consider adding a dedicated snafu context for JSON errors:

// In error.rs, add:
#[snafu(display("JSON parse error"))]
JsonParse { source: serde_json::Error },

Then update the methods:

     async fn read_metadata(&self) -> Result<CacheMetadata, DownloadError> {
         let metadata_str = fs::read_to_string(self.metadata_path())
             .await
             .context(FileReadSnafu)?;
-        serde_json::from_str(&metadata_str)
-            .ok()
-            .ok_or(DownloadError::StateCorrupted)
+        serde_json::from_str(&metadata_str).context(JsonParseSnafu)
     }

     async fn write_metadata(&self, metadata: &CacheMetadata) -> Result<(), DownloadError> {
-        let metadata_str = serde_json::to_string_pretty(metadata)
-            .ok()
-            .ok_or(DownloadError::StateCorrupted)?;
+        let metadata_str = serde_json::to_string_pretty(metadata).context(JsonParseSnafu)?;
         fs::write(self.metadata_path(), metadata_str)
             .await
             .context(FileWriteSnafu)
     }
crates/common/utils/downloader/src/chunk_merger.rs (1)

64-85: Move buffer allocation outside the loop to avoid repeated allocations.

The 512KB buffer is allocated for each chunk. Moving it outside the loop avoids repeated allocations and improves performance, especially for downloads with many chunks.

Additionally, the BufReader is not being utilized effectively (same issue as in cache_manager.rs).

♻️ Proposed fix
         let mut hasher = Sha256::new();
         let mut total_size = 0u64;
+        let mut buffer = vec![0u8; 512 * 1024]; // 512KB buffer - reused across chunks

         for chunk in &chunks {
-            let chunk_file = File::open(&chunk.temp_file).await.context(FileReadSnafu)?;
-            let mut reader = BufReader::with_capacity(512 * 1024, chunk_file); // 512KB buffer
-            let mut buffer = vec![0u8; 512 * 1024]; // 512KB buffer
+            let mut chunk_file = File::open(&chunk.temp_file).await.context(FileReadSnafu)?;

             // Read and write with hashing
             loop {
-                let n = reader.read(&mut buffer).await.context(FileReadSnafu)?;
+                let n = chunk_file.read(&mut buffer).await.context(FileReadSnafu)?;
                 if n == 0 {
                     break;
                 }
crates/common/utils/downloader/src/single_downloader.rs (1)

62-82: Consider cleaning up the temp file on error.

If the download fails after the temp file is created (e.g., network error mid-stream), the .download file will remain on disk as an orphan. Consider wrapping the download logic to clean up the temp file on failure.

♻️ Suggested approach
         // Download to a temp file first, then rename
         let temp_path = request.output_path.with_extension("download");
-        let mut file = File::create(&temp_path).await.context(FileWriteSnafu)?;
+        let result = self.download_to_temp(&temp_path, &mut stream).await;
+        if result.is_err() {
+            let _ = fs::remove_file(&temp_path).await;
+        }
+        let (total_size, hasher) = result?;

Alternatively, you could use a guard struct that removes the temp file on drop unless explicitly disarmed after success.

crates/common/utils/downloader/src/config.rs (1)

104-110: Consider adding an assertion for the exact expected value.

The test asserts a range (chunks >= 2 && chunks <= 4), but with known inputs (50MB file, 5MB min chunk), the calculation is deterministic: 50/5 = 10, clamped to 4. A precise assertion would catch unexpected behavior.

💡 More precise test
     fn test_chunk_calculation_medium_file() {
         let config = ChunkingConfig::default();
         // 50MB - should be 2-4 chunks
         let chunks = config.calculate_chunks(50 * 1024 * 1024);
-        assert!(chunks >= 2 && chunks <= 4);
+        // 50MB / 5MB = 10, clamped to 4
+        assert_eq!(chunks, 4);
     }
crates/common/utils/downloader/src/chunk_downloader.rs (1)

74-82: Redundant status check: 206 Partial Content is already a success status.

StatusCode::PARTIAL_CONTENT (206) is in the 2xx range, so status.is_success() already returns true for it. The explicit || status == StatusCode::PARTIAL_CONTENT check is redundant.

♻️ Simplified check
-        // Check for successful response (200 OK or 206 Partial Content)
-        let status = response.status();
-        ensure!(
-            status.is_success() || status == StatusCode::PARTIAL_CONTENT,
-            HttpSnafu {
-                status: status.as_u16(),
-                url:    &self.url,
-            }
-        );
+        // Check for successful response (2xx including 206 Partial Content)
+        ensure!(
+            response.status().is_success(),
+            HttpSnafu {
+                status: response.status().as_u16(),
+                url:    &self.url,
+            }
+        );
crates/common/utils/downloader/tests/integration_test.rs (2)

114-124: Minor: parse_range doesn't handle all HTTP range formats.

This helper only handles explicit bytes=start-end ranges and will return None for suffix ranges (bytes=-500) or open-ended ranges (bytes=500-). This is fine for the current test scenarios but worth noting if tests expand.


350-355: Test duplicates internal hash_url logic.

The URL hashing logic is duplicated here because hash_url is private in state_manager. If the hashing algorithm changes, this test will silently break. Consider either exposing a test helper or documenting this coupling.

crates/common/utils/downloader/src/state_manager.rs (1)

92-94: Consider preserving JSON parse error details.

Converting serde_json errors via .ok().ok_or(StateCorrupted) loses the original error context. For debugging corrupt state files, the specific deserialization error would be helpful.

Suggested improvement
-        let state: DownloadState = serde_json::from_str(&state_str)
-            .ok()
-            .ok_or(DownloadError::StateCorrupted)?;
+        let state: DownloadState = serde_json::from_str(&state_str)
+            .map_err(|e| DownloadError::StateCorrupted { 
+                details: e.to_string() 
+            })?;

This would require updating the StateCorrupted variant in error.rs to include a details field.

crates/common/utils/downloader/src/downloader.rs (1)

112-127: Lock files are not cleaned up after successful downloads.

The lock file at lock_path is created but never removed after download completion. While this doesn't affect correctness (empty lock files are harmless), it may accumulate over time.

Suggested cleanup

Consider adding lock file cleanup at the end of download_inner and resume_inner:

// At end of download, cleanup lock file
let _ = fs::remove_file(&lock_path).await;

Or include it in the existing cleanup method.

Address all clippy warnings in downloader crate:
- Remove redundant clone in ParallelDownloadManager initialization
- Add truncate(true) to file open options for clarity
- Collapse nested if-let statements using let-chains
- Replace manual range check with (2..=4).contains()
- Convert match with identical arms to if-let pattern
- Fix uninlined format arguments in tests
- Rename test variables to avoid similar naming conflicts
- Add allow annotation for intentional cast in test code

All changes preserve functionality and improve code quality.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/common/utils/downloader/src/config.rs`:
- Around line 43-60: The calculate_chunks method must guard against zero-valued
settings in ChunkingConfig: ensure min_chunk_size (used via min_chunk) and
max_chunks are clamped to at least 1 before any division or min/comparison;
replace direct uses of min_chunk and self.max_chunks with normalized values
(e.g., let min_chunk = self.min_chunk_size.as_bytes().max(1); let max_chunks =
self.max_chunks.max(1) as u64) and then compute chunks = file_size / min_chunk,
apply chunks.clamp(2,4) for the medium case, and finally ensure the returned
result is at least 1 (e.g., after min(...), cast and clamp to >=1) so no
divide-by-zero or zero-chunk results occur.

Comment on lines +43 to +60
pub fn calculate_chunks(&self, file_size: u64) -> usize {
let small_threshold = self.small_file_threshold.as_bytes();
let medium_threshold = self.medium_file_threshold.as_bytes();
let min_chunk = self.min_chunk_size.as_bytes();

if file_size < small_threshold {
// Small file: no chunking
1
} else if file_size < medium_threshold {
// Medium file: 2-4 chunks
let chunks = file_size / min_chunk;
chunks.clamp(2, 4) as usize
} else {
// Large file: based on size, capped at max_chunks
let chunks = file_size / min_chunk;
#[allow(clippy::cast_possible_truncation)]
let result = chunks.min(self.max_chunks as u64) as usize;
result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against zero-sized chunk settings to avoid divide-by-zero / zero-chunk results.

ChunkingConfig is public; if min_chunk_size or max_chunks is set to 0, calculate_chunks can divide by zero or return 0, which will later panic in chunk-boundary math. Please clamp to at least 1.

🐛 Suggested fix
     pub fn calculate_chunks(&self, file_size: u64) -> usize {
         let small_threshold = self.small_file_threshold.as_bytes();
         let medium_threshold = self.medium_file_threshold.as_bytes();
-        let min_chunk = self.min_chunk_size.as_bytes();
+        let min_chunk = self.min_chunk_size.as_bytes().max(1);
+        let max_chunks = self.max_chunks.max(1) as u64;

         if file_size < small_threshold {
             // Small file: no chunking
             1
         } else if file_size < medium_threshold {
             // Medium file: 2-4 chunks
             let chunks = file_size / min_chunk;
             chunks.clamp(2, 4) as usize
         } else {
             // Large file: based on size, capped at max_chunks
-            let chunks = file_size / min_chunk;
+            let chunks = (file_size / min_chunk).max(1);
             #[allow(clippy::cast_possible_truncation)]
-            let result = chunks.min(self.max_chunks as u64) as usize;
+            let result = chunks.min(max_chunks) as usize;
             result
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn calculate_chunks(&self, file_size: u64) -> usize {
let small_threshold = self.small_file_threshold.as_bytes();
let medium_threshold = self.medium_file_threshold.as_bytes();
let min_chunk = self.min_chunk_size.as_bytes();
if file_size < small_threshold {
// Small file: no chunking
1
} else if file_size < medium_threshold {
// Medium file: 2-4 chunks
let chunks = file_size / min_chunk;
chunks.clamp(2, 4) as usize
} else {
// Large file: based on size, capped at max_chunks
let chunks = file_size / min_chunk;
#[allow(clippy::cast_possible_truncation)]
let result = chunks.min(self.max_chunks as u64) as usize;
result
pub fn calculate_chunks(&self, file_size: u64) -> usize {
let small_threshold = self.small_file_threshold.as_bytes();
let medium_threshold = self.medium_file_threshold.as_bytes();
let min_chunk = self.min_chunk_size.as_bytes().max(1);
let max_chunks = self.max_chunks.max(1) as u64;
if file_size < small_threshold {
// Small file: no chunking
1
} else if file_size < medium_threshold {
// Medium file: 2-4 chunks
let chunks = file_size / min_chunk;
chunks.clamp(2, 4) as usize
} else {
// Large file: based on size, capped at max_chunks
let chunks = (file_size / min_chunk).max(1);
#[allow(clippy::cast_possible_truncation)]
let result = chunks.min(max_chunks) as usize;
result
}
}
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/config.rs` around lines 43 - 60, The
calculate_chunks method must guard against zero-valued settings in
ChunkingConfig: ensure min_chunk_size (used via min_chunk) and max_chunks are
clamped to at least 1 before any division or min/comparison; replace direct uses
of min_chunk and self.max_chunks with normalized values (e.g., let min_chunk =
self.min_chunk_size.as_bytes().max(1); let max_chunks = self.max_chunks.max(1)
as u64) and then compute chunks = file_size / min_chunk, apply chunks.clamp(2,4)
for the medium case, and finally ensure the returned result is at least 1 (e.g.,
after min(...), cast and clamp to >=1) so no divide-by-zero or zero-chunk
results occur.

Critical fixes:
- Add RFC 5843 Base64 digest header parsing support
- Fix silent task panic suppression in parallel downloads
- Add division by zero guards in chunk boundary calculation

Minor improvements:
- Use case-insensitive SHA256 comparison
- Remove redundant BufReader usage in cache manager
- Move buffer allocation outside loop in chunk merger
- Add temp file cleanup guard for error cases
- Replace range assertion with exact value in tests
- Remove redundant PARTIAL_CONTENT status check
- Add lock file cleanup in state manager

All tests pass and clippy warnings resolved.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/common/utils/downloader/src/chunk_downloader.rs`:
- Around line 73-81: The current check uses status.is_success() which allows 200
OK and can corrupt ranged downloads; change the logic in chunk_downloader.rs
where status is read (variable status, response) to explicitly require 206
Partial Content (or validate Content-Range/Content-Length headers) and if the
server returned 200 or failed validation return the RangeNotSupported error
instead of the generic HttpSnafu (preserving status and &self.url in logs).
Update the ensure/early-return around the status check to detect 200 and map it
to RangeNotSupported, or validate the Content-Range header on the response
before proceeding with writing the chunk.

In `@crates/common/utils/downloader/src/downloader.rs`:
- Around line 248-261: The build_client function currently calls
try_into().expect("timeout must be non-negative") which panics for negative
SignedDuration; update build_client to explicitly validate config.timeout before
conversion (or convert using checked methods) and return a
Result<reqwest::Client, Error> (or propagate a typed error) instead of
panicking: inspect DownloaderConfig::timeout, if negative return an Err with a
clear message, otherwise convert to std::time::Duration and continue to build
the reqwest::Client (update callers to handle the Result), or alternatively
change DownloaderConfig.timeout to an unsigned Duration at construction time and
remove the panic.

In `@crates/common/utils/downloader/src/state_manager.rs`:
- Around line 176-193: The calculate_chunk_boundaries function can produce
zero-sized or underflowing ranges when num_chunks > file_size; clamp num_chunks
to at most file_size (e.g., let effective_chunks = num_chunks.min(file_size as
usize)) before computing chunk_size, and then use effective_chunks for capacity,
loop bounds, and start/end calculations so chunk_size is never zero and end
won't underflow in calculate_chunk_boundaries.
🧹 Nitpick comments (4)
crates/common/utils/downloader/src/cache_manager.rs (2)

50-56: SHA256 comparison should be case-insensitive for robustness.

The compute_sha256 method returns lowercase hex, but if the sha256 parameter passed to store() originates from a server-provided checksum with different casing, the comparison in get() will incorrectly report corruption.

♻️ Proposed fix
         // Verify file integrity - recompute hash and check against stored metadata
         let actual_sha256 = Self::compute_sha256(&cache_entry.content_path()).await?;
-        if actual_sha256 != metadata.sha256 {
+        if !actual_sha256.eq_ignore_ascii_case(&metadata.sha256) {
             // Cache is corrupted, remove it
             cache_entry.remove().await;
             return Ok(None);
         }

175-191: Serialization errors are swallowed, losing diagnostic information.

Using .ok().ok_or(DownloadError::StateCorrupted) discards the actual serde_json error, making debugging harder. Consider using snafu context for better error messages.

♻️ Proposed fix for read_metadata
     async fn read_metadata(&self) -> Result<CacheMetadata, DownloadError> {
         let metadata_str = fs::read_to_string(self.metadata_path())
             .await
             .context(FileReadSnafu)?;
-        serde_json::from_str(&metadata_str)
-            .ok()
-            .ok_or(DownloadError::StateCorrupted)
+        serde_json::from_str(&metadata_str).map_err(|_| DownloadError::StateCorrupted)
     }

Note: If you want to preserve the error details, consider adding a dedicated error variant like MetadataParseError { source: serde_json::Error } instead of StateCorrupted.

crates/common/utils/downloader/src/config.rs (1)

112-119: Test could use exact value assertion.

The large file test uses range assertion, but the expected value is deterministic: 500MB / 5MB = 100 chunks, capped at max_chunks (16).

♻️ Suggested improvement
     #[test]
     fn test_chunk_calculation_large_file() {
         let config = ChunkingConfig::default();
         // 500MB - should be capped at 16 chunks
         let chunks = config.calculate_chunks(500 * 1024 * 1024);
-        assert!(chunks <= 16);
-        assert!(chunks > 4);
+        assert_eq!(chunks, 16);
     }
crates/common/utils/downloader/src/downloader.rs (1)

112-128: Blocking file I/O in async context.

std::fs::OpenOptions is used instead of tokio::fs, which blocks the async runtime. For lock file operations this is typically acceptable due to their fast nature, but on slow filesystems or network-mounted directories, this could cause latency spikes.

Consider using tokio::task::spawn_blocking if lock file operations become a bottleneck in production:

let file = tokio::task::spawn_blocking(move || {
    std::fs::OpenOptions::new()
        .write(true)
        .create(true)
        .truncate(true)
        .open(&lock_path)
}).await.context(FileWriteSnafu)?.context(FileWriteSnafu)?;

Comment on lines +73 to +81
// Check for successful response (206 is already included in is_success())
let status = response.status();
ensure!(
status.is_success(),
HttpSnafu {
status: status.as_u16(),
url: &self.url,
}
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ranged downloads should reject 200 OK to prevent chunk corruption.
If a server ignores the Range header and returns 200, this will write the full body into a single chunk and corrupt the merged result. Prefer enforcing 206 (or validating Content-Range/Content-Length) and returning RangeNotSupported otherwise.

🐛 Proposed fix
-        // Check for successful response (206 is already included in is_success())
-        let status = response.status();
-        ensure!(
-            status.is_success(),
-            HttpSnafu {
-                status: status.as_u16(),
-                url:    &self.url,
-            }
-        );
+        let status = response.status();
+        if status == reqwest::StatusCode::OK {
+            return Err(DownloadError::RangeNotSupported);
+        }
+        ensure!(
+            status == reqwest::StatusCode::PARTIAL_CONTENT,
+            HttpSnafu {
+                status: status.as_u16(),
+                url:    &self.url,
+            }
+        );
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/chunk_downloader.rs` around lines 73 - 81,
The current check uses status.is_success() which allows 200 OK and can corrupt
ranged downloads; change the logic in chunk_downloader.rs where status is read
(variable status, response) to explicitly require 206 Partial Content (or
validate Content-Range/Content-Length headers) and if the server returned 200 or
failed validation return the RangeNotSupported error instead of the generic
HttpSnafu (preserving status and &self.url in logs). Update the
ensure/early-return around the status check to detect 200 and map it to
RangeNotSupported, or validate the Content-Range header on the response before
proceeding with writing the chunk.

Comment on lines +248 to +261
fn build_client(config: &DownloaderConfig) -> reqwest::Client {
let timeout: std::time::Duration = config
.timeout
.try_into()
.expect("timeout must be non-negative");

let mut builder = reqwest::Client::builder().timeout(timeout);

if let Some(ref ua) = config.user_agent {
builder = builder.user_agent(ua);
}

builder.build().expect("Failed to build HTTP client")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Panic on negative timeout conversion.

Line 252 uses expect("timeout must be non-negative") but a negative SignedDuration could be configured, causing a panic at runtime.

🐛 Proposed fix
     fn build_client(config: &DownloaderConfig) -> reqwest::Client {
-        let timeout: std::time::Duration = config
-            .timeout
-            .try_into()
-            .expect("timeout must be non-negative");
+        let timeout: std::time::Duration = config
+            .timeout
+            .try_into()
+            .unwrap_or(std::time::Duration::from_secs(30));

         let mut builder = reqwest::Client::builder().timeout(timeout);

Alternatively, validate timeout in DownloaderConfig construction or make timeout a Duration (unsigned) instead of SignedDuration.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn build_client(config: &DownloaderConfig) -> reqwest::Client {
let timeout: std::time::Duration = config
.timeout
.try_into()
.expect("timeout must be non-negative");
let mut builder = reqwest::Client::builder().timeout(timeout);
if let Some(ref ua) = config.user_agent {
builder = builder.user_agent(ua);
}
builder.build().expect("Failed to build HTTP client")
}
fn build_client(config: &DownloaderConfig) -> reqwest::Client {
let timeout: std::time::Duration = config
.timeout
.try_into()
.unwrap_or(std::time::Duration::from_secs(30));
let mut builder = reqwest::Client::builder().timeout(timeout);
if let Some(ref ua) = config.user_agent {
builder = builder.user_agent(ua);
}
builder.build().expect("Failed to build HTTP client")
}
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/downloader.rs` around lines 248 - 261, The
build_client function currently calls try_into().expect("timeout must be
non-negative") which panics for negative SignedDuration; update build_client to
explicitly validate config.timeout before conversion (or convert using checked
methods) and return a Result<reqwest::Client, Error> (or propagate a typed
error) instead of panicking: inspect DownloaderConfig::timeout, if negative
return an Err with a clear message, otherwise convert to std::time::Duration and
continue to build the reqwest::Client (update callers to handle the Result), or
alternatively change DownloaderConfig.timeout to an unsigned Duration at
construction time and remove the panic.

Comment on lines +176 to +193
/// Calculate chunk boundaries for parallel download
#[must_use]
pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> {
// Guard against zero chunks or zero file size
if num_chunks == 0 || file_size == 0 {
return Vec::new();
}

let chunk_size = file_size / num_chunks as u64;
let mut boundaries = Vec::with_capacity(num_chunks);

for i in 0..num_chunks {
let start = i as u64 * chunk_size;
let end = if i == num_chunks - 1 {
file_size - 1 // Last chunk goes to end
} else {
(i as u64 + 1) * chunk_size - 1
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against num_chunks > file_size to avoid zero-sized chunks.
When file_size < num_chunks, chunk_size becomes 0 and end underflows, producing invalid ranges. Clamp num_chunks before computing boundaries.

🐛 Proposed fix
 pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> {
     // Guard against zero chunks or zero file size
     if num_chunks == 0 || file_size == 0 {
         return Vec::new();
     }
 
-    let chunk_size = file_size / num_chunks as u64;
-    let mut boundaries = Vec::with_capacity(num_chunks);
+    let num_chunks = if (num_chunks as u64) > file_size {
+        file_size as usize
+    } else {
+        num_chunks
+    };
+    let chunk_size = file_size / num_chunks as u64;
+    let mut boundaries = Vec::with_capacity(num_chunks);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Calculate chunk boundaries for parallel download
#[must_use]
pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> {
// Guard against zero chunks or zero file size
if num_chunks == 0 || file_size == 0 {
return Vec::new();
}
let chunk_size = file_size / num_chunks as u64;
let mut boundaries = Vec::with_capacity(num_chunks);
for i in 0..num_chunks {
let start = i as u64 * chunk_size;
let end = if i == num_chunks - 1 {
file_size - 1 // Last chunk goes to end
} else {
(i as u64 + 1) * chunk_size - 1
};
/// Calculate chunk boundaries for parallel download
#[must_use]
pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> {
// Guard against zero chunks or zero file size
if num_chunks == 0 || file_size == 0 {
return Vec::new();
}
let num_chunks = if (num_chunks as u64) > file_size {
file_size as usize
} else {
num_chunks
};
let chunk_size = file_size / num_chunks as u64;
let mut boundaries = Vec::with_capacity(num_chunks);
for i in 0..num_chunks {
let start = i as u64 * chunk_size;
let end = if i == num_chunks - 1 {
file_size - 1 // Last chunk goes to end
} else {
(i as u64 + 1) * chunk_size - 1
};
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/state_manager.rs` around lines 176 - 193,
The calculate_chunk_boundaries function can produce zero-sized or underflowing
ranges when num_chunks > file_size; clamp num_chunks to at most file_size (e.g.,
let effective_chunks = num_chunks.min(file_size as usize)) before computing
chunk_size, and then use effective_chunks for capacity, loop bounds, and
start/end calculations so chunk_size is never zero and end won't underflow in
calculate_chunk_boundaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants