-
Notifications
You must be signed in to change notification settings - Fork 1
feat(downloader): add HTTP downloader with resumable downloads and caching #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add comprehensive design document for high-performance HTTP downloader with multi-threaded chunked downloads, resume capability, and caching. Key features: - Smart chunking based on file size (16MB/128MB thresholds) - Multi-threaded download with standard threads (not async) - Chunk-level resume after interruption - URL + SHA256 based caching - Streaming merge with SHA256 verification Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ching Implement a production-ready HTTP downloader library with the following features: - Chunked parallel downloads with range request support - Automatic resume from interruption with state persistence - SHA256-based content caching to avoid re-downloading - File locking to prevent concurrent downloads of same URL - Configurable retry logic with exponential backoff - Integration tests using axum-test and tempfile Test coverage includes: - Single download with cache hit validation - Parallel chunked downloads with range support - Download locking for concurrent requests - Resume after interruption with partial chunk state Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a new workspace crate Changes
Sequence DiagramssequenceDiagram
actor User
participant Downloader as Downloader
participant Cache as CacheManager
participant Info as FileInfoFetcher
participant Server as Server
participant State as StateManager
participant Parallel as ParallelManager
participant Merger as ChunkMerger
participant Single as SingleThreadDownloader
User->>Downloader: download(request)
Downloader->>Cache: get(url)
alt cache hit
Cache-->>Downloader: CachedFile
Downloader-->>User: DownloadResult (from_cache=true)
else cache miss
Downloader->>Info: fetch(url)
Info->>Server: HEAD /file
Server-->>Info: metadata (size, checksum, ranges)
Info-->>Downloader: FileInfo
alt small file or no range
Downloader->>Single: download(request)
Single->>Server: GET /file
Server-->>Single: full content
Single-->>Downloader: (size, sha256)
else parallel
Downloader->>State: create_state(url,...)
State-->>Downloader: DownloadState
Downloader->>Parallel: download_all(state)
Parallel->>Server: GET /file (Range: bytes=...)
Server-->>Parallel: partial content
Parallel-->>Downloader: all chunks done
Downloader->>Merger: merge_and_verify(request, state)
Merger-->>Downloader: (size, sha256)
Downloader->>State: cleanup(url)
end
Downloader->>Cache: store(url, file_path, sha256)
Downloader-->>User: DownloadResult (from_cache=false)
end
sequenceDiagram
actor User
participant Downloader as Downloader
participant State as StateManager
participant Info as FileInfoFetcher
participant Server as Server
participant Parallel as ParallelManager
participant Merger as ChunkMerger
User->>Downloader: download(request) [resume attempt]
Downloader->>State: load(url)
State-->>Downloader: DownloadState (partial)
Downloader->>Info: fetch(url)
Info->>Server: HEAD /file
Server-->>Info: metadata
Info-->>Downloader: FileInfo
Downloader->>State: validate(state, url, checksum, size)
alt valid
Downloader->>Parallel: download_all(state)
Parallel->>Server: GET /file (Range: pending chunks)
Server-->>Parallel: partial content
Parallel-->>Downloader: done
Downloader->>Merger: merge_and_verify(request, state)
Merger-->>Downloader: (size, sha256)
Downloader->>State: cleanup(url)
Downloader-->>User: DownloadResult
else invalid
State-->>Downloader: invalid -> cleanup and restart
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@crates/common/utils/downloader/src/downloader.rs`:
- Around line 329-338: The SHA256 comparison in verify_sha256 currently compares
strings case-sensitively and can falsely fail when hex case differs; update
verify_sha256 to perform a case-insensitive comparison by normalizing both
actual and expected (e.g., to_lowercase or to_uppercase) before comparing,
keeping the Sha256MismatchSnafu error construction (expected/actual) but
consider logging or storing the original values if you need original-case
context; modify the function verify_sha256 accordingly to use the normalized
values for the equality check.
In `@crates/common/utils/downloader/src/file_info.rs`:
- Around line 74-76: The comment in file_info.rs incorrectly lists "ETag" as a
header that is tried for SHA256 extraction but the implementation doesn't parse
ETag; update the comment in the function that extracts SHA256 checksum from HTTP
headers (the extractor in file_info.rs) to remove "ETag" from the list or
otherwise clarify that ETag is not currently supported, or alternatively
implement ETag parsing (e.g., strip quotes and validate hex length) in the
extractor so the code matches the comment—pick one and make the comment and the
function (extractor in file_info.rs) consistent.
- Around line 92-106: The Digest header parsing in the block that checks
headers.get(HeaderName::from_static("digest")) only accepts 64-char hex but RFC
5843 uses Base64-encoded SHA-256; update the logic in file_info.rs (the Digest
handling branch) to also accept Base64-formatted values: when
s.strip_prefix("SHA-256=") or "sha-256=" yields a value that is not 64 hex
chars, attempt to base64-decode it (use a standard base64 decoder), verify the
decoded length is 32 bytes, then convert that byte array to a lowercase hex
string (e.g., hex::encode_lower) and return it; keep the existing hex-path
intact and return None on any decode/validation errors.
In `@crates/common/utils/downloader/src/parallel_manager.rs`:
- Around line 149-153: The Err(_) match arm currently swallows JoinHandle
failures; replace it to capture the JoinError (e.g., Err(join_err)) and handle
panics/cancellations explicitly: if join_err.is_panic() log the panic info
(Debug or downcast the panic payload) and return/propagate an Err from the
containing function so the caller knows the download failed, otherwise log the
cancellation and decide whether to retry or return an error; update the match in
the loop that awaits JoinHandle results (the arm currently commented "Task was
cancelled or panicked") to perform these checks and propagate a meaningful error
instead of silently skipping.
In `@crates/common/utils/downloader/src/state_manager.rs`:
- Around line 172-189: calculate_chunk_boundaries currently divides by
num_chunks and will panic if num_chunks == 0; add an early guard at the top of
the function (in calculate_chunk_boundaries) that returns an empty Vec when
num_chunks == 0 (and also return empty if file_size == 0) to avoid division by
zero and downstream panics—this is a defensive, non-breaking change that
prevents the panic when ChunkingConfig can yield zero chunks.
🧹 Nitpick comments (12)
docs/plans/2026-01-26-http-downloader-design.md (2)
88-93: Minor: Add language specifier to fenced code blocks.Several code blocks in this design document lack language specifiers (e.g., lines 88, 118, 208). Adding language hints improves syntax highlighting and tooling.
For example, for the Range request calculation block:
-``` +```text 分片 0: bytes=0-{chunk_size-1}
13-17: Note: Implementation diverged from design (async vs blocking).The design specifies
reqwest (blocking API)andstd::threadfor concurrency, but the actual implementation uses async reqwest with tokio. This is a reasonable evolution for better integration with the existing async codebase. Consider updating the design document to reflect the actual implementation choices for future reference.crates/common/utils/downloader/src/cache_manager.rs (2)
127-143: Redundant buffering:BufReaderis not utilized effectively.Using
BufReaderwith a manual buffer andread()bypassesBufReader's internal buffering. Either removeBufReaderor use its buffered read capabilities.♻️ Proposed fix - remove redundant BufReader
async fn compute_sha256(path: &Path) -> Result<String, DownloadError> { - let file = fs::File::open(path).await.context(FileReadSnafu)?; - let mut reader = BufReader::with_capacity(512 * 1024, file); // 512KB buffer + let mut file = fs::File::open(path).await.context(FileReadSnafu)?; let mut hasher = Sha256::new(); let mut buffer = vec![0u8; 512 * 1024]; // 512KB buffer loop { - let n = reader.read(&mut buffer).await.context(FileReadSnafu)?; + let n = file.read(&mut buffer).await.context(FileReadSnafu)?; if n == 0 { break; } hasher.update(&buffer[..n]); } Ok(format!("{:x}", hasher.finalize())) }
179-195: Lost error context in JSON serialization/deserialization.Using
.ok().ok_or(DownloadError::StateCorrupted)discards the originalserde_jsonerror, making debugging harder when cache metadata is malformed.♻️ Proposed fix - preserve error context
Consider adding a dedicated snafu context for JSON errors:
// In error.rs, add: #[snafu(display("JSON parse error"))] JsonParse { source: serde_json::Error },Then update the methods:
async fn read_metadata(&self) -> Result<CacheMetadata, DownloadError> { let metadata_str = fs::read_to_string(self.metadata_path()) .await .context(FileReadSnafu)?; - serde_json::from_str(&metadata_str) - .ok() - .ok_or(DownloadError::StateCorrupted) + serde_json::from_str(&metadata_str).context(JsonParseSnafu) } async fn write_metadata(&self, metadata: &CacheMetadata) -> Result<(), DownloadError> { - let metadata_str = serde_json::to_string_pretty(metadata) - .ok() - .ok_or(DownloadError::StateCorrupted)?; + let metadata_str = serde_json::to_string_pretty(metadata).context(JsonParseSnafu)?; fs::write(self.metadata_path(), metadata_str) .await .context(FileWriteSnafu) }crates/common/utils/downloader/src/chunk_merger.rs (1)
64-85: Move buffer allocation outside the loop to avoid repeated allocations.The 512KB buffer is allocated for each chunk. Moving it outside the loop avoids repeated allocations and improves performance, especially for downloads with many chunks.
Additionally, the
BufReaderis not being utilized effectively (same issue as incache_manager.rs).♻️ Proposed fix
let mut hasher = Sha256::new(); let mut total_size = 0u64; + let mut buffer = vec![0u8; 512 * 1024]; // 512KB buffer - reused across chunks for chunk in &chunks { - let chunk_file = File::open(&chunk.temp_file).await.context(FileReadSnafu)?; - let mut reader = BufReader::with_capacity(512 * 1024, chunk_file); // 512KB buffer - let mut buffer = vec![0u8; 512 * 1024]; // 512KB buffer + let mut chunk_file = File::open(&chunk.temp_file).await.context(FileReadSnafu)?; // Read and write with hashing loop { - let n = reader.read(&mut buffer).await.context(FileReadSnafu)?; + let n = chunk_file.read(&mut buffer).await.context(FileReadSnafu)?; if n == 0 { break; }crates/common/utils/downloader/src/single_downloader.rs (1)
62-82: Consider cleaning up the temp file on error.If the download fails after the temp file is created (e.g., network error mid-stream), the
.downloadfile will remain on disk as an orphan. Consider wrapping the download logic to clean up the temp file on failure.♻️ Suggested approach
// Download to a temp file first, then rename let temp_path = request.output_path.with_extension("download"); - let mut file = File::create(&temp_path).await.context(FileWriteSnafu)?; + let result = self.download_to_temp(&temp_path, &mut stream).await; + if result.is_err() { + let _ = fs::remove_file(&temp_path).await; + } + let (total_size, hasher) = result?;Alternatively, you could use a guard struct that removes the temp file on drop unless explicitly disarmed after success.
crates/common/utils/downloader/src/config.rs (1)
104-110: Consider adding an assertion for the exact expected value.The test asserts a range (
chunks >= 2 && chunks <= 4), but with known inputs (50MB file, 5MB min chunk), the calculation is deterministic:50/5 = 10, clamped to 4. A precise assertion would catch unexpected behavior.💡 More precise test
fn test_chunk_calculation_medium_file() { let config = ChunkingConfig::default(); // 50MB - should be 2-4 chunks let chunks = config.calculate_chunks(50 * 1024 * 1024); - assert!(chunks >= 2 && chunks <= 4); + // 50MB / 5MB = 10, clamped to 4 + assert_eq!(chunks, 4); }crates/common/utils/downloader/src/chunk_downloader.rs (1)
74-82: Redundant status check: 206 Partial Content is already a success status.
StatusCode::PARTIAL_CONTENT(206) is in the 2xx range, sostatus.is_success()already returnstruefor it. The explicit|| status == StatusCode::PARTIAL_CONTENTcheck is redundant.♻️ Simplified check
- // Check for successful response (200 OK or 206 Partial Content) - let status = response.status(); - ensure!( - status.is_success() || status == StatusCode::PARTIAL_CONTENT, - HttpSnafu { - status: status.as_u16(), - url: &self.url, - } - ); + // Check for successful response (2xx including 206 Partial Content) + ensure!( + response.status().is_success(), + HttpSnafu { + status: response.status().as_u16(), + url: &self.url, + } + );crates/common/utils/downloader/tests/integration_test.rs (2)
114-124: Minor:parse_rangedoesn't handle all HTTP range formats.This helper only handles explicit
bytes=start-endranges and will returnNonefor suffix ranges (bytes=-500) or open-ended ranges (bytes=500-). This is fine for the current test scenarios but worth noting if tests expand.
350-355: Test duplicates internalhash_urllogic.The URL hashing logic is duplicated here because
hash_urlis private instate_manager. If the hashing algorithm changes, this test will silently break. Consider either exposing a test helper or documenting this coupling.crates/common/utils/downloader/src/state_manager.rs (1)
92-94: Consider preserving JSON parse error details.Converting
serde_jsonerrors via.ok().ok_or(StateCorrupted)loses the original error context. For debugging corrupt state files, the specific deserialization error would be helpful.Suggested improvement
- let state: DownloadState = serde_json::from_str(&state_str) - .ok() - .ok_or(DownloadError::StateCorrupted)?; + let state: DownloadState = serde_json::from_str(&state_str) + .map_err(|e| DownloadError::StateCorrupted { + details: e.to_string() + })?;This would require updating the
StateCorruptedvariant in error.rs to include adetailsfield.crates/common/utils/downloader/src/downloader.rs (1)
112-127: Lock files are not cleaned up after successful downloads.The lock file at
lock_pathis created but never removed after download completion. While this doesn't affect correctness (empty lock files are harmless), it may accumulate over time.Suggested cleanup
Consider adding lock file cleanup at the end of
download_innerandresume_inner:// At end of download, cleanup lock file let _ = fs::remove_file(&lock_path).await;Or include it in the existing
cleanupmethod.
Address all clippy warnings in downloader crate: - Remove redundant clone in ParallelDownloadManager initialization - Add truncate(true) to file open options for clarity - Collapse nested if-let statements using let-chains - Replace manual range check with (2..=4).contains() - Convert match with identical arms to if-let pattern - Fix uninlined format arguments in tests - Rename test variables to avoid similar naming conflicts - Add allow annotation for intentional cast in test code All changes preserve functionality and improve code quality. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/common/utils/downloader/src/config.rs`:
- Around line 43-60: The calculate_chunks method must guard against zero-valued
settings in ChunkingConfig: ensure min_chunk_size (used via min_chunk) and
max_chunks are clamped to at least 1 before any division or min/comparison;
replace direct uses of min_chunk and self.max_chunks with normalized values
(e.g., let min_chunk = self.min_chunk_size.as_bytes().max(1); let max_chunks =
self.max_chunks.max(1) as u64) and then compute chunks = file_size / min_chunk,
apply chunks.clamp(2,4) for the medium case, and finally ensure the returned
result is at least 1 (e.g., after min(...), cast and clamp to >=1) so no
divide-by-zero or zero-chunk results occur.
| pub fn calculate_chunks(&self, file_size: u64) -> usize { | ||
| let small_threshold = self.small_file_threshold.as_bytes(); | ||
| let medium_threshold = self.medium_file_threshold.as_bytes(); | ||
| let min_chunk = self.min_chunk_size.as_bytes(); | ||
|
|
||
| if file_size < small_threshold { | ||
| // Small file: no chunking | ||
| 1 | ||
| } else if file_size < medium_threshold { | ||
| // Medium file: 2-4 chunks | ||
| let chunks = file_size / min_chunk; | ||
| chunks.clamp(2, 4) as usize | ||
| } else { | ||
| // Large file: based on size, capped at max_chunks | ||
| let chunks = file_size / min_chunk; | ||
| #[allow(clippy::cast_possible_truncation)] | ||
| let result = chunks.min(self.max_chunks as u64) as usize; | ||
| result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against zero-sized chunk settings to avoid divide-by-zero / zero-chunk results.
ChunkingConfig is public; if min_chunk_size or max_chunks is set to 0, calculate_chunks can divide by zero or return 0, which will later panic in chunk-boundary math. Please clamp to at least 1.
🐛 Suggested fix
pub fn calculate_chunks(&self, file_size: u64) -> usize {
let small_threshold = self.small_file_threshold.as_bytes();
let medium_threshold = self.medium_file_threshold.as_bytes();
- let min_chunk = self.min_chunk_size.as_bytes();
+ let min_chunk = self.min_chunk_size.as_bytes().max(1);
+ let max_chunks = self.max_chunks.max(1) as u64;
if file_size < small_threshold {
// Small file: no chunking
1
} else if file_size < medium_threshold {
// Medium file: 2-4 chunks
let chunks = file_size / min_chunk;
chunks.clamp(2, 4) as usize
} else {
// Large file: based on size, capped at max_chunks
- let chunks = file_size / min_chunk;
+ let chunks = (file_size / min_chunk).max(1);
#[allow(clippy::cast_possible_truncation)]
- let result = chunks.min(self.max_chunks as u64) as usize;
+ let result = chunks.min(max_chunks) as usize;
result
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn calculate_chunks(&self, file_size: u64) -> usize { | |
| let small_threshold = self.small_file_threshold.as_bytes(); | |
| let medium_threshold = self.medium_file_threshold.as_bytes(); | |
| let min_chunk = self.min_chunk_size.as_bytes(); | |
| if file_size < small_threshold { | |
| // Small file: no chunking | |
| 1 | |
| } else if file_size < medium_threshold { | |
| // Medium file: 2-4 chunks | |
| let chunks = file_size / min_chunk; | |
| chunks.clamp(2, 4) as usize | |
| } else { | |
| // Large file: based on size, capped at max_chunks | |
| let chunks = file_size / min_chunk; | |
| #[allow(clippy::cast_possible_truncation)] | |
| let result = chunks.min(self.max_chunks as u64) as usize; | |
| result | |
| pub fn calculate_chunks(&self, file_size: u64) -> usize { | |
| let small_threshold = self.small_file_threshold.as_bytes(); | |
| let medium_threshold = self.medium_file_threshold.as_bytes(); | |
| let min_chunk = self.min_chunk_size.as_bytes().max(1); | |
| let max_chunks = self.max_chunks.max(1) as u64; | |
| if file_size < small_threshold { | |
| // Small file: no chunking | |
| 1 | |
| } else if file_size < medium_threshold { | |
| // Medium file: 2-4 chunks | |
| let chunks = file_size / min_chunk; | |
| chunks.clamp(2, 4) as usize | |
| } else { | |
| // Large file: based on size, capped at max_chunks | |
| let chunks = (file_size / min_chunk).max(1); | |
| #[allow(clippy::cast_possible_truncation)] | |
| let result = chunks.min(max_chunks) as usize; | |
| result | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/config.rs` around lines 43 - 60, The
calculate_chunks method must guard against zero-valued settings in
ChunkingConfig: ensure min_chunk_size (used via min_chunk) and max_chunks are
clamped to at least 1 before any division or min/comparison; replace direct uses
of min_chunk and self.max_chunks with normalized values (e.g., let min_chunk =
self.min_chunk_size.as_bytes().max(1); let max_chunks = self.max_chunks.max(1)
as u64) and then compute chunks = file_size / min_chunk, apply chunks.clamp(2,4)
for the medium case, and finally ensure the returned result is at least 1 (e.g.,
after min(...), cast and clamp to >=1) so no divide-by-zero or zero-chunk
results occur.
Critical fixes: - Add RFC 5843 Base64 digest header parsing support - Fix silent task panic suppression in parallel downloads - Add division by zero guards in chunk boundary calculation Minor improvements: - Use case-insensitive SHA256 comparison - Remove redundant BufReader usage in cache manager - Move buffer allocation outside loop in chunk merger - Add temp file cleanup guard for error cases - Replace range assertion with exact value in tests - Remove redundant PARTIAL_CONTENT status check - Add lock file cleanup in state manager All tests pass and clippy warnings resolved. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@crates/common/utils/downloader/src/chunk_downloader.rs`:
- Around line 73-81: The current check uses status.is_success() which allows 200
OK and can corrupt ranged downloads; change the logic in chunk_downloader.rs
where status is read (variable status, response) to explicitly require 206
Partial Content (or validate Content-Range/Content-Length headers) and if the
server returned 200 or failed validation return the RangeNotSupported error
instead of the generic HttpSnafu (preserving status and &self.url in logs).
Update the ensure/early-return around the status check to detect 200 and map it
to RangeNotSupported, or validate the Content-Range header on the response
before proceeding with writing the chunk.
In `@crates/common/utils/downloader/src/downloader.rs`:
- Around line 248-261: The build_client function currently calls
try_into().expect("timeout must be non-negative") which panics for negative
SignedDuration; update build_client to explicitly validate config.timeout before
conversion (or convert using checked methods) and return a
Result<reqwest::Client, Error> (or propagate a typed error) instead of
panicking: inspect DownloaderConfig::timeout, if negative return an Err with a
clear message, otherwise convert to std::time::Duration and continue to build
the reqwest::Client (update callers to handle the Result), or alternatively
change DownloaderConfig.timeout to an unsigned Duration at construction time and
remove the panic.
In `@crates/common/utils/downloader/src/state_manager.rs`:
- Around line 176-193: The calculate_chunk_boundaries function can produce
zero-sized or underflowing ranges when num_chunks > file_size; clamp num_chunks
to at most file_size (e.g., let effective_chunks = num_chunks.min(file_size as
usize)) before computing chunk_size, and then use effective_chunks for capacity,
loop bounds, and start/end calculations so chunk_size is never zero and end
won't underflow in calculate_chunk_boundaries.
🧹 Nitpick comments (4)
crates/common/utils/downloader/src/cache_manager.rs (2)
50-56: SHA256 comparison should be case-insensitive for robustness.The
compute_sha256method returns lowercase hex, but if thesha256parameter passed tostore()originates from a server-provided checksum with different casing, the comparison inget()will incorrectly report corruption.♻️ Proposed fix
// Verify file integrity - recompute hash and check against stored metadata let actual_sha256 = Self::compute_sha256(&cache_entry.content_path()).await?; - if actual_sha256 != metadata.sha256 { + if !actual_sha256.eq_ignore_ascii_case(&metadata.sha256) { // Cache is corrupted, remove it cache_entry.remove().await; return Ok(None); }
175-191: Serialization errors are swallowed, losing diagnostic information.Using
.ok().ok_or(DownloadError::StateCorrupted)discards the actualserde_jsonerror, making debugging harder. Consider usingsnafucontext for better error messages.♻️ Proposed fix for read_metadata
async fn read_metadata(&self) -> Result<CacheMetadata, DownloadError> { let metadata_str = fs::read_to_string(self.metadata_path()) .await .context(FileReadSnafu)?; - serde_json::from_str(&metadata_str) - .ok() - .ok_or(DownloadError::StateCorrupted) + serde_json::from_str(&metadata_str).map_err(|_| DownloadError::StateCorrupted) }Note: If you want to preserve the error details, consider adding a dedicated error variant like
MetadataParseError { source: serde_json::Error }instead ofStateCorrupted.crates/common/utils/downloader/src/config.rs (1)
112-119: Test could use exact value assertion.The large file test uses range assertion, but the expected value is deterministic: 500MB / 5MB = 100 chunks, capped at
max_chunks(16).♻️ Suggested improvement
#[test] fn test_chunk_calculation_large_file() { let config = ChunkingConfig::default(); // 500MB - should be capped at 16 chunks let chunks = config.calculate_chunks(500 * 1024 * 1024); - assert!(chunks <= 16); - assert!(chunks > 4); + assert_eq!(chunks, 16); }crates/common/utils/downloader/src/downloader.rs (1)
112-128: Blocking file I/O in async context.
std::fs::OpenOptionsis used instead oftokio::fs, which blocks the async runtime. For lock file operations this is typically acceptable due to their fast nature, but on slow filesystems or network-mounted directories, this could cause latency spikes.Consider using
tokio::task::spawn_blockingif lock file operations become a bottleneck in production:let file = tokio::task::spawn_blocking(move || { std::fs::OpenOptions::new() .write(true) .create(true) .truncate(true) .open(&lock_path) }).await.context(FileWriteSnafu)?.context(FileWriteSnafu)?;
| // Check for successful response (206 is already included in is_success()) | ||
| let status = response.status(); | ||
| ensure!( | ||
| status.is_success(), | ||
| HttpSnafu { | ||
| status: status.as_u16(), | ||
| url: &self.url, | ||
| } | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ranged downloads should reject 200 OK to prevent chunk corruption.
If a server ignores the Range header and returns 200, this will write the full body into a single chunk and corrupt the merged result. Prefer enforcing 206 (or validating Content-Range/Content-Length) and returning RangeNotSupported otherwise.
🐛 Proposed fix
- // Check for successful response (206 is already included in is_success())
- let status = response.status();
- ensure!(
- status.is_success(),
- HttpSnafu {
- status: status.as_u16(),
- url: &self.url,
- }
- );
+ let status = response.status();
+ if status == reqwest::StatusCode::OK {
+ return Err(DownloadError::RangeNotSupported);
+ }
+ ensure!(
+ status == reqwest::StatusCode::PARTIAL_CONTENT,
+ HttpSnafu {
+ status: status.as_u16(),
+ url: &self.url,
+ }
+ );🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/chunk_downloader.rs` around lines 73 - 81,
The current check uses status.is_success() which allows 200 OK and can corrupt
ranged downloads; change the logic in chunk_downloader.rs where status is read
(variable status, response) to explicitly require 206 Partial Content (or
validate Content-Range/Content-Length headers) and if the server returned 200 or
failed validation return the RangeNotSupported error instead of the generic
HttpSnafu (preserving status and &self.url in logs). Update the
ensure/early-return around the status check to detect 200 and map it to
RangeNotSupported, or validate the Content-Range header on the response before
proceeding with writing the chunk.
| fn build_client(config: &DownloaderConfig) -> reqwest::Client { | ||
| let timeout: std::time::Duration = config | ||
| .timeout | ||
| .try_into() | ||
| .expect("timeout must be non-negative"); | ||
|
|
||
| let mut builder = reqwest::Client::builder().timeout(timeout); | ||
|
|
||
| if let Some(ref ua) = config.user_agent { | ||
| builder = builder.user_agent(ua); | ||
| } | ||
|
|
||
| builder.build().expect("Failed to build HTTP client") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic on negative timeout conversion.
Line 252 uses expect("timeout must be non-negative") but a negative SignedDuration could be configured, causing a panic at runtime.
🐛 Proposed fix
fn build_client(config: &DownloaderConfig) -> reqwest::Client {
- let timeout: std::time::Duration = config
- .timeout
- .try_into()
- .expect("timeout must be non-negative");
+ let timeout: std::time::Duration = config
+ .timeout
+ .try_into()
+ .unwrap_or(std::time::Duration::from_secs(30));
let mut builder = reqwest::Client::builder().timeout(timeout);Alternatively, validate timeout in DownloaderConfig construction or make timeout a Duration (unsigned) instead of SignedDuration.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn build_client(config: &DownloaderConfig) -> reqwest::Client { | |
| let timeout: std::time::Duration = config | |
| .timeout | |
| .try_into() | |
| .expect("timeout must be non-negative"); | |
| let mut builder = reqwest::Client::builder().timeout(timeout); | |
| if let Some(ref ua) = config.user_agent { | |
| builder = builder.user_agent(ua); | |
| } | |
| builder.build().expect("Failed to build HTTP client") | |
| } | |
| fn build_client(config: &DownloaderConfig) -> reqwest::Client { | |
| let timeout: std::time::Duration = config | |
| .timeout | |
| .try_into() | |
| .unwrap_or(std::time::Duration::from_secs(30)); | |
| let mut builder = reqwest::Client::builder().timeout(timeout); | |
| if let Some(ref ua) = config.user_agent { | |
| builder = builder.user_agent(ua); | |
| } | |
| builder.build().expect("Failed to build HTTP client") | |
| } |
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/downloader.rs` around lines 248 - 261, The
build_client function currently calls try_into().expect("timeout must be
non-negative") which panics for negative SignedDuration; update build_client to
explicitly validate config.timeout before conversion (or convert using checked
methods) and return a Result<reqwest::Client, Error> (or propagate a typed
error) instead of panicking: inspect DownloaderConfig::timeout, if negative
return an Err with a clear message, otherwise convert to std::time::Duration and
continue to build the reqwest::Client (update callers to handle the Result), or
alternatively change DownloaderConfig.timeout to an unsigned Duration at
construction time and remove the panic.
| /// Calculate chunk boundaries for parallel download | ||
| #[must_use] | ||
| pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> { | ||
| // Guard against zero chunks or zero file size | ||
| if num_chunks == 0 || file_size == 0 { | ||
| return Vec::new(); | ||
| } | ||
|
|
||
| let chunk_size = file_size / num_chunks as u64; | ||
| let mut boundaries = Vec::with_capacity(num_chunks); | ||
|
|
||
| for i in 0..num_chunks { | ||
| let start = i as u64 * chunk_size; | ||
| let end = if i == num_chunks - 1 { | ||
| file_size - 1 // Last chunk goes to end | ||
| } else { | ||
| (i as u64 + 1) * chunk_size - 1 | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against num_chunks > file_size to avoid zero-sized chunks.
When file_size < num_chunks, chunk_size becomes 0 and end underflows, producing invalid ranges. Clamp num_chunks before computing boundaries.
🐛 Proposed fix
pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> {
// Guard against zero chunks or zero file size
if num_chunks == 0 || file_size == 0 {
return Vec::new();
}
- let chunk_size = file_size / num_chunks as u64;
- let mut boundaries = Vec::with_capacity(num_chunks);
+ let num_chunks = if (num_chunks as u64) > file_size {
+ file_size as usize
+ } else {
+ num_chunks
+ };
+ let chunk_size = file_size / num_chunks as u64;
+ let mut boundaries = Vec::with_capacity(num_chunks);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Calculate chunk boundaries for parallel download | |
| #[must_use] | |
| pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> { | |
| // Guard against zero chunks or zero file size | |
| if num_chunks == 0 || file_size == 0 { | |
| return Vec::new(); | |
| } | |
| let chunk_size = file_size / num_chunks as u64; | |
| let mut boundaries = Vec::with_capacity(num_chunks); | |
| for i in 0..num_chunks { | |
| let start = i as u64 * chunk_size; | |
| let end = if i == num_chunks - 1 { | |
| file_size - 1 // Last chunk goes to end | |
| } else { | |
| (i as u64 + 1) * chunk_size - 1 | |
| }; | |
| /// Calculate chunk boundaries for parallel download | |
| #[must_use] | |
| pub fn calculate_chunk_boundaries(file_size: u64, num_chunks: usize) -> Vec<(u64, u64)> { | |
| // Guard against zero chunks or zero file size | |
| if num_chunks == 0 || file_size == 0 { | |
| return Vec::new(); | |
| } | |
| let num_chunks = if (num_chunks as u64) > file_size { | |
| file_size as usize | |
| } else { | |
| num_chunks | |
| }; | |
| let chunk_size = file_size / num_chunks as u64; | |
| let mut boundaries = Vec::with_capacity(num_chunks); | |
| for i in 0..num_chunks { | |
| let start = i as u64 * chunk_size; | |
| let end = if i == num_chunks - 1 { | |
| file_size - 1 // Last chunk goes to end | |
| } else { | |
| (i as u64 + 1) * chunk_size - 1 | |
| }; |
🤖 Prompt for AI Agents
In `@crates/common/utils/downloader/src/state_manager.rs` around lines 176 - 193,
The calculate_chunk_boundaries function can produce zero-sized or underflowing
ranges when num_chunks > file_size; clamp num_chunks to at most file_size (e.g.,
let effective_chunks = num_chunks.min(file_size as usize)) before computing
chunk_size, and then use effective_chunks for capacity, loop bounds, and
start/end calculations so chunk_size is never zero and end won't underflow in
calculate_chunk_boundaries.
Summary
Add a production-ready HTTP downloader library with comprehensive features for reliable file downloads.
Features
Implementation
reqwestfor HTTP client with streaming supportfd-lockfor cross-platform file lockingbackonfor retry strategies with exponential backoffserde_jsonfor resume capabilityaxum-testandtempfileTest Coverage
✅ Single download with cache hit validation
✅ Parallel chunked downloads with range support
✅ Download locking for concurrent requests
✅ Resume after interruption with partial chunk state
All tests pass with no warnings.
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.