feat: integrate self-encryption for streaming file encrypt/decrypt#23
feat: integrate self-encryption for streaming file encrypt/decrypt#23
Conversation
Replace plaintext file chunking with convergent encryption via the self_encryption crate (quantum-safe: ChaCha20-Poly1305 + BLAKE3). - Add src/client/self_encrypt.rs with streaming encrypt/decrypt, DataMap serialization, and public/private data mode support - Update CLI with --public upload flag and --datamap download option - Set plaintext chunk size to 4MB-4KB via .cargo/config.toml to leave headroom for AEAD tag and compression overhead - Point self_encryption dep at grumbach/post_quatum branch
|
@claude @greptile-apps please review |
|
Claude finished @grumbach's task —— View job Code Review Complete
SummaryThis PR successfully integrates convergent encryption using the Critical Issues1. CLAUDE.md Violation:
|
There was a problem hiding this comment.
Pull request overview
Integrates convergent “self-encryption” for file upload/download flows, replacing plaintext chunking/manifest-based file reconstruction with DataMap-driven streaming encrypt/decrypt and adding CLI support for public vs private file modes.
Changes:
- Add
src/client/self_encrypt.rsimplementing streaming encrypt/upload, streaming download/decrypt, andDataMap(de)serialization with public/private modes. - Update
saorsa-clifile subcommands to support--publicuploads and--datamapdownloads. - Add a git-based
self_encryptiondependency and a Cargo env setting intended to constrain plaintext chunk sizing.
Reviewed changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/storage/handler.rs |
Refactors to use imported compute_address / Error in chunk PUT verification. |
src/lib.rs |
Re-exports self-encryption helpers from the library API. |
src/client/self_encrypt.rs |
New streaming self-encryption integration + tests + DataMap public/private helpers. |
src/client/mod.rs |
Exposes the new self_encrypt module. |
src/bin/saorsa-cli/main.rs |
Switches file upload/download to DataMap-based encrypt/decrypt; adds public/private handling. |
src/bin/saorsa-cli/cli.rs |
Adds --public upload and --datamap download flags to the CLI. |
Cargo.toml |
Adds self_encryption dependency via a git branch. |
.cargo/config.toml |
Sets an env var intended to cap plaintext chunk size for encryption headroom. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| hkdf = "0.12" | ||
|
|
||
| # Self-encryption (convergent encryption + streaming) | ||
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } |
There was a problem hiding this comment.
The self_encryption dependency is pulled from a git branch, which makes builds non-reproducible and increases supply-chain risk (the branch can move). Prefer pinning to an immutable rev (commit hash) or a published crate version/tag, and update via explicit bumps.
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } | |
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", rev = "REPLACE_WITH_ACTUAL_COMMIT_HASH" } |
src/client/self_encrypt.rs
Outdated
| let data_iter = std::iter::from_fn(move || { | ||
| let mut buf = vec![0u8; 65536]; | ||
| match reader.read(&mut buf) { | ||
| Ok(0) | Err(_) => None, | ||
| Ok(n) => { | ||
| buf.truncate(n); | ||
| Some(Bytes::from(buf)) | ||
| } | ||
| } |
There was a problem hiding this comment.
encrypt_file_to_chunks treats Err(_) from reader.read() as EOF (None), silently truncating the input and potentially producing a valid-looking DataMap for incomplete data. Please surface the underlying I/O error instead (e.g., capture the error and return it after iteration, or use an iterator that can propagate read errors if the self_encryption API supports it).
src/client/self_encrypt.rs
Outdated
| let (data_map, collected) = encrypt_file_to_chunks(file_path, file_size)?; | ||
|
|
||
| let chunk_count = collected.len(); | ||
| info!("Encryption complete: {chunk_count} chunk(s) produced"); |
There was a problem hiding this comment.
Despite the module docs claiming a “low memory footprint”, encrypt_and_upload_file fully materializes all encrypted chunks in memory (encrypt_file_to_chunks returns a Vec<(XorName, Bytes)>) before uploading. For large files this can be multiple GiB of peak RAM. Consider uploading chunks as they are produced (or spooling to disk) and only keeping the DataMap/minimal state needed to finalize.
| let stream = self_encryption::streaming_decrypt(data_map, |batch: &[(usize, XorName)]| { | ||
| let mut results = Vec::with_capacity(batch.len()); | ||
| for &(idx, ref hash) in batch { | ||
| let addr = hash.0; | ||
| let addr_hex = hex::encode(addr); | ||
| debug!("Fetching chunk idx={idx} ({addr_hex})"); | ||
| let chunk = tokio::task::block_in_place(|| handle.block_on(client.get_chunk(&addr))) | ||
| .map_err(|e| { | ||
| self_encryption::Error::Generic(format!( | ||
| "Network fetch failed for {addr_hex}: {e}" | ||
| )) | ||
| })? | ||
| .ok_or_else(|| { | ||
| self_encryption::Error::Generic(format!( | ||
| "Chunk not found on network: {addr_hex}" | ||
| )) | ||
| })?; | ||
| results.push((idx, chunk.content)); | ||
| } |
There was a problem hiding this comment.
The streaming_decrypt fetch callback executes block_in_place + handle.block_on(client.get_chunk(...)) once per chunk in the batch, making network fetches strictly sequential and adding extra blocking transitions. Since the callback already receives a batch, consider fetching the whole batch in a single block_in_place and running the requests concurrently (e.g., join_all) to reduce latency and improve throughput.
Greptile SummaryThis PR replaces the existing plaintext file-chunking pipeline with convergent encryption via the Three issues require attention before merge:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| src/client/self_encrypt.rs | Core encryption/decryption module; previously flagged issues (I/O error capture, temp file uniqueness via rand::random(), atomic rename, runtime flavor guard) are all addressed in this revision. The implementation is well-structured with proper streaming for uploads and bounded-concurrency batch fetching for downloads. |
| src/bin/saorsa-cli/main.rs | Two actionable issues: (1) std::fs::write silently overwrites a pre-existing .datamap file, permanently destroying access to a previous private upload with no warning; (2) when both address and --datamap are supplied together, the address is silently ignored. |
| src/bin/saorsa-cli/cli.rs | CLI argument definitions are clear and well-tested, but address and --datamap are not declared as mutually exclusive, allowing silent precedence of --datamap over address with no user-visible error. |
| Cargo.toml | Adds self_encryption from an unversioned git branch; the PR description promises a .cargo/config.toml that sets the plaintext chunk size to avoid exceeding MAX_CHUNK_SIZE, but that file is absent from this PR, creating a potential size mismatch between produced ciphertexts and the network's acceptance limit. |
| src/lib.rs | Correctly re-exports the new self_encrypt public API alongside the legacy plaintext chunking API; the backward-compatibility comment is a helpful signal for downstream consumers. |
| src/client/mod.rs | Minimal addition of pub mod self_encrypt; to expose the new module; no issues. |
| src/storage/handler.rs | Cosmetic-only cleanup: imports compute_address and Error directly instead of via full crate paths; no behavioral changes. |
Sequence Diagram
sequenceDiagram
participant CLI as saorsa-cli
participant SE as self_encrypt.rs
participant Enc as self_encryption crate
participant Net as QuantumClient / Network
note over CLI,Net: Private Upload Flow
CLI->>SE: encrypt_and_upload_file(path, client)
SE->>Enc: open_encrypt_stream(file, file_size)
Enc-->>SE: EncryptionStream + ReadErrorCapture
loop bounded concurrency (UPLOAD_CONCURRENCY=4)
SE->>Enc: stream.chunks().next()
Enc-->>SE: (XorName, Bytes) encrypted chunk
SE->>Net: put_chunk_with_payment(chunk)
Net-->>SE: (address, tx_hashes)
end
SE->>SE: check_read_error()
SE->>Enc: stream.into_datamap()
Enc-->>SE: DataMap
SE-->>CLI: (DataMap, Vec<tx_hash>)
CLI->>CLI: serialize_data_map() → write .datamap file
CLI->>CLI: println! DATAMAP_FILE=...
note over CLI,Net: Public Upload (additional step)
CLI->>SE: store_data_map_public(data_map, client)
SE->>Net: put_chunk_with_payment(serialized DataMap)
Net-->>SE: (dm_address, tx_hashes)
SE-->>CLI: (dm_address, tx_hashes)
CLI->>CLI: println! FILE_ADDRESS=...
note over CLI,Net: Download Flow (private)
CLI->>CLI: read .datamap file → deserialize_data_map()
CLI->>SE: download_and_decrypt_file(data_map, output_path, client)
SE->>Enc: streaming_decrypt(data_map, fetch_callback)
loop per batch of chunks
Enc->>SE: fetch_callback(batch)
SE->>SE: block_in_place + handle.block_on
SE->>Net: get_chunk(xorname) × batch (FuturesUnordered)
Net-->>SE: Vec<Bytes>
SE-->>Enc: Vec<(idx, Bytes)>
end
SE->>SE: write_stream_to_file(stream, tmp_path)
SE->>SE: rename tmp → output_path (atomic)
SE-->>CLI: Ok(())
Comments Outside Diff (1)
-
src/bin/saorsa-cli/cli.rs, line 82-92 (link)addresssilently ignored when--datamapis also providedBoth
address(positional) and--datamapcan be specified at the same time. Clap will accept the invocation without complaint, but inhandle_download, theif let Some(dm_path) = datamap_pathbranch takes precedence and theaddressargument is silently discarded. A user who types:saorsa-cli file download --datamap /old/file.datamap abc123networkexpecting the network address to be used will get the local DataMap instead — with no error or warning.
These two arguments are mutually exclusive and should be declared as such via
conflicts_with:/// Hex-encoded address (public data map address or manifest address). #[arg(conflicts_with = "datamap")] address: Option<String>,
This makes Clap reject the ambiguous combination at the CLI layer before any logic runs.
Prompt To Fix With AI
This is a comment left during a code review. Path: src/bin/saorsa-cli/cli.rs Line: 82-92 Comment: **`address` silently ignored when `--datamap` is also provided** Both `address` (positional) and `--datamap` can be specified at the same time. Clap will accept the invocation without complaint, but in `handle_download`, the `if let Some(dm_path) = datamap_path` branch takes precedence and the `address` argument is silently discarded. A user who types: ``` saorsa-cli file download --datamap /old/file.datamap abc123network ``` expecting the network address to be used will get the local DataMap instead — with no error or warning. These two arguments are mutually exclusive and should be declared as such via `conflicts_with`: ```rust /// Hex-encoded address (public data map address or manifest address). #[arg(conflicts_with = "datamap")] address: Option<String>, ``` This makes Clap reject the ambiguous combination at the CLI layer before any logic runs. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/bin/saorsa-cli/main.rs
Line: 148-149
Comment:
**Silent overwrite of existing `.datamap` destroys access to previous private uploads**
`std::fs::write` silently overwrites any pre-existing `.datamap` file at that path. For a private upload, the `.datamap` is the *sole decryption capability* — losing it is permanent data loss. If a user uploads the same file a second time (e.g., to a different network epoch or after paying again), the first upload becomes irrecoverable without any warning.
This is especially dangerous because the path is derived mechanically from the input path: `path.with_extension("datamap")`. The user has no indication that a pre-existing `.datamap` file will be destroyed.
At minimum, check for a pre-existing file and fail with an actionable error:
```rust
if datamap_path.exists() {
return Err(color_eyre::eyre::eyre!(
"DataMap file already exists at {}. Move or delete it before re-uploading.",
datamap_path.display()
));
}
std::fs::write(&datamap_path, &data_map_bytes)?;
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/bin/saorsa-cli/cli.rs
Line: 82-92
Comment:
**`address` silently ignored when `--datamap` is also provided**
Both `address` (positional) and `--datamap` can be specified at the same time. Clap will accept the invocation without complaint, but in `handle_download`, the `if let Some(dm_path) = datamap_path` branch takes precedence and the `address` argument is silently discarded. A user who types:
```
saorsa-cli file download --datamap /old/file.datamap abc123network
```
expecting the network address to be used will get the local DataMap instead — with no error or warning.
These two arguments are mutually exclusive and should be declared as such via `conflicts_with`:
```rust
/// Hex-encoded address (public data map address or manifest address).
#[arg(conflicts_with = "datamap")]
address: Option<String>,
```
This makes Clap reject the ambiguous combination at the CLI layer before any logic runs.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: Cargo.toml
Line: 56-58
Comment:
**`.cargo/config.toml` chunk-size config referenced in PR description is missing from the diff**
The PR description states:
> Set plaintext chunk size to 4MB-4KB via `.cargo/config.toml` to leave headroom for AEAD tag and compression overhead
However, `.cargo/config.toml` is not present in the repository and was not added in this PR. Without the `SELF_ENCRYPTION_MAX_CHUNK_SIZE` (or equivalent) environment/config override, the `self_encryption` crate will use its own compiled-in default for `MAX_CHUNK_SIZE`.
If the default plaintext chunk size produces ciphertexts that exceed `MAX_CHUNK_SIZE` or `MAX_WIRE_MESSAGE_SIZE` as defined in `src/ant_protocol`, every upload will fail with a `ChunkTooLarge` error from the handler — a silent, hard-to-diagnose mismatch between what the encryptor produces and what the network accepts.
Please add `.cargo/config.toml` with the intended `SELF_ENCRYPTION_MAX_CHUNK_SIZE = 4190208` (4 MiB − 4 KiB) setting, or document why the crate default is already within the acceptable range.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: a032af4
src/client/self_encrypt.rs
Outdated
| match reader.read(&mut buf) { | ||
| Ok(0) | Err(_) => None, | ||
| Ok(n) => { | ||
| buf.truncate(n); | ||
| Some(Bytes::from(buf)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Silent I/O error treated as EOF during file read
The Err(_) arm in the from_fn iterator silently terminates iteration instead of propagating the error. If a read fails mid-file (e.g., due to a truncated or unreadable file), the iterator stops early without any error signal. Since file_size is passed to stream_encrypt but fewer bytes are fed into it, the self-encryption crate will likely return an error, but the root cause (an I/O error) is lost and replaced by a generic "Self-encryption failed" message — making debugging very difficult.
| match reader.read(&mut buf) { | |
| Ok(0) | Err(_) => None, | |
| Ok(n) => { | |
| buf.truncate(n); | |
| Some(Bytes::from(buf)) | |
| } | |
| } | |
| let data_iter = std::iter::from_fn(move || { | |
| let mut buf = vec![0u8; 65536]; | |
| match reader.read(&mut buf) { | |
| Ok(0) => None, | |
| Err(e) => { | |
| tracing::error!("I/O error reading file for encryption: {e}"); | |
| None | |
| } | |
| Ok(n) => { | |
| buf.truncate(n); | |
| Some(Bytes::from(buf)) | |
| } | |
| } | |
| }); |
A cleaner long-term fix would be to return Option<Result<Bytes>> from the iterator and handle errors in the for loop, or to use an intermediate staging buffer that accumulates Err state. At minimum, the error should be logged before the iterator terminates so callers can diagnose the failure.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/client/self_encrypt.rs
Line: 89-95
Comment:
**Silent I/O error treated as EOF during file read**
The `Err(_)` arm in the `from_fn` iterator silently terminates iteration instead of propagating the error. If a read fails mid-file (e.g., due to a truncated or unreadable file), the iterator stops early without any error signal. Since `file_size` is passed to `stream_encrypt` but fewer bytes are fed into it, the self-encryption crate will likely return an error, but the root cause (an I/O error) is lost and replaced by a generic "Self-encryption failed" message — making debugging very difficult.
```suggestion
let data_iter = std::iter::from_fn(move || {
let mut buf = vec![0u8; 65536];
match reader.read(&mut buf) {
Ok(0) => None,
Err(e) => {
tracing::error!("I/O error reading file for encryption: {e}");
None
}
Ok(n) => {
buf.truncate(n);
Some(Bytes::from(buf))
}
}
});
```
A cleaner long-term fix would be to return `Option<Result<Bytes>>` from the iterator and handle errors in the `for` loop, or to use an intermediate staging buffer that accumulates `Err` state. At minimum, the error should be logged before the iterator terminates so callers can diagnose the failure.
How can I resolve this? If you propose a fix, please make it concise.
src/bin/saorsa-cli/main.rs
Outdated
| std::fs::write(&datamap_path, &data_map_bytes)?; | ||
|
|
||
| println!("DATAMAP_FILE={}", datamap_path.display()); | ||
| println!("DATAMAP_HEX={}", hex::encode(&data_map_bytes)); |
There was a problem hiding this comment.
Private DataMap printed to stdout — security leak
In private mode, the DataMap is the sole capability required to decrypt the uploaded file. Printing it in full as DATAMAP_HEX=<hex> to stdout defeats the purpose of "private" mode:
- Shell scripts that capture stdout (e.g.,
OUTPUT=$(saorsa-cli file upload ...)) will embed the key in a variable that may be logged. - CI/CD systems routinely record all stdout; any log with this output reveals the decryption key to anyone with log access.
- The DataMap is already persisted to the
.datamapfile on disk (line 149) — the hex output is redundant and dangerous.
Consider removing the DATAMAP_HEX= line entirely. If a machine-readable form is needed, only print the DATAMAP_FILE= path and let the caller read the file themselves.
| println!("DATAMAP_HEX={}", hex::encode(&data_map_bytes)); | |
| println!("DATAMAP_FILE={}", datamap_path.display()); | |
| println!("MODE=private"); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/bin/saorsa-cli/main.rs
Line: 152
Comment:
**Private DataMap printed to stdout — security leak**
In private mode, the `DataMap` is the sole capability required to decrypt the uploaded file. Printing it in full as `DATAMAP_HEX=<hex>` to stdout defeats the purpose of "private" mode:
- Shell scripts that capture stdout (e.g., `OUTPUT=$(saorsa-cli file upload ...)`) will embed the key in a variable that may be logged.
- CI/CD systems routinely record all stdout; any log with this output reveals the decryption key to anyone with log access.
- The DataMap is already persisted to the `.datamap` file on disk (line 149) — the hex output is redundant and dangerous.
Consider removing the `DATAMAP_HEX=` line entirely. If a machine-readable form is needed, only print the `DATAMAP_FILE=` path and let the caller read the file themselves.
```suggestion
println!("DATAMAP_FILE={}", datamap_path.display());
println!("MODE=private");
```
How can I resolve this? If you propose a fix, please make it concise.
src/client/self_encrypt.rs
Outdated
| let mut file = std::fs::File::create(output_path).map_err(Error::Io)?; | ||
| for chunk_result in stream { | ||
| let chunk_bytes = | ||
| chunk_result.map_err(|e| Error::Crypto(format!("Decryption failed: {e}")))?; | ||
| file.write_all(&chunk_bytes).map_err(Error::Io)?; | ||
| } | ||
|
|
There was a problem hiding this comment.
Partial/corrupt output file left behind on decryption failure
File::create truncates any pre-existing file and begins writing immediately. If decryption fails partway through (e.g., a chunk is missing or tampered), the output file will be left in a partially-written, corrupt state. A user who then tries to use that file will get silent data corruption instead of a clean error.
Consider writing to a temporary file in the same directory and renaming it atomically on success:
let tmp_path = output_path.with_extension("tmp");
let mut file = std::fs::File::create(&tmp_path).map_err(Error::Io)?;
for chunk_result in stream {
let chunk_bytes =
chunk_result.map_err(|e| {
let _ = std::fs::remove_file(&tmp_path);
Error::Crypto(format!("Decryption failed: {e}"))
})?;
file.write_all(&chunk_bytes).map_err(Error::Io)?;
}
drop(file);
std::fs::rename(&tmp_path, output_path).map_err(Error::Io)?;This ensures the output path either contains a complete, correct file or nothing at all.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/client/self_encrypt.rs
Line: 162-168
Comment:
**Partial/corrupt output file left behind on decryption failure**
`File::create` truncates any pre-existing file and begins writing immediately. If decryption fails partway through (e.g., a chunk is missing or tampered), the output file will be left in a partially-written, corrupt state. A user who then tries to use that file will get silent data corruption instead of a clean error.
Consider writing to a temporary file in the same directory and renaming it atomically on success:
```rust
let tmp_path = output_path.with_extension("tmp");
let mut file = std::fs::File::create(&tmp_path).map_err(Error::Io)?;
for chunk_result in stream {
let chunk_bytes =
chunk_result.map_err(|e| {
let _ = std::fs::remove_file(&tmp_path);
Error::Crypto(format!("Decryption failed: {e}"))
})?;
file.write_all(&chunk_bytes).map_err(Error::Io)?;
}
drop(file);
std::fs::rename(&tmp_path, output_path).map_err(Error::Io)?;
```
This ensures the output path either contains a complete, correct file or nothing at all.
How can I resolve this? If you propose a fix, please make it concise.| hkdf = "0.12" | ||
|
|
||
| # Self-encryption (convergent encryption + streaming) | ||
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } |
There was a problem hiding this comment.
Typo in Git branch name — post_quatum vs post_quantum
The branch name "post_quatum" appears to be a misspelling of "post_quantum". If this matches the actual branch name on the fork it is intentional, but it would make the dependency very hard to locate/audit and would break if the branch is ever corrected or renamed.
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } | |
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quantum" } |
Please confirm the exact branch name and either correct it here or add a comment explaining why the unusual spelling was chosen.
Prompt To Fix With AI
This is a comment left during a code review.
Path: Cargo.toml
Line: 57
Comment:
**Typo in Git branch name — `post_quatum` vs `post_quantum`**
The branch name `"post_quatum"` appears to be a misspelling of `"post_quantum"`. If this matches the actual branch name on the fork it is intentional, but it would make the dependency very hard to locate/audit and would break if the branch is ever corrected or renamed.
```suggestion
self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quantum" }
```
Please confirm the exact branch name and either correct it here or add a comment explaining why the unusual spelling was chosen.
How can I resolve this? If you propose a fix, please make it concise.
src/client/self_encrypt.rs
Outdated
| ) -> Result<(DataMap, Vec<String>)> { | ||
| let metadata = std::fs::metadata(file_path).map_err(Error::Io)?; | ||
| let file_size: usize = metadata | ||
| .len() | ||
| .try_into() | ||
| .map_err(|_| Error::Crypto("File too large for this platform".into()))?; | ||
| info!( | ||
| "Encrypting file: {} ({file_size} bytes)", | ||
| file_path.display() | ||
| ); | ||
|
|
||
| let (data_map, collected) = encrypt_file_to_chunks(file_path, file_size)?; | ||
|
|
There was a problem hiding this comment.
Misleading "low memory footprint" / "streaming" claim in public API
The module-level and function-level doc comments repeatedly advertise low memory usage ("low memory footprint", "reads the file incrementally via an iterator"). However, encrypt_file_to_chunks collects all encrypted chunks into a Vec<(XorName, Bytes)> before returning. For a large file (e.g., 4 GB split into ~1000 × 4 MiB chunks), this means roughly the full encrypted size of the file is held in memory simultaneously before any upload begins.
The docs should accurately reflect this:
| ) -> Result<(DataMap, Vec<String>)> { | |
| let metadata = std::fs::metadata(file_path).map_err(Error::Io)?; | |
| let file_size: usize = metadata | |
| .len() | |
| .try_into() | |
| .map_err(|_| Error::Crypto("File too large for this platform".into()))?; | |
| info!( | |
| "Encrypting file: {} ({file_size} bytes)", | |
| file_path.display() | |
| ); | |
| let (data_map, collected) = encrypt_file_to_chunks(file_path, file_size)?; | |
| /// Encrypt a file using streaming self-encryption and upload each chunk. | |
| /// | |
| /// Uses `stream_encrypt()` which reads the file incrementally via an iterator. | |
| /// Note: all encrypted chunks are collected in memory before uploading begins, | |
| /// so peak memory usage is proportional to the encrypted file size. | |
| /// | |
| /// Returns the `DataMap` after all chunks are uploaded, plus the list of | |
| /// transaction hash strings from payment. |
If true streaming upload (upload each chunk as soon as it is produced) is desired, the encryption and upload loops would need to be interleaved.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/client/self_encrypt.rs
Line: 43-55
Comment:
**Misleading "low memory footprint" / "streaming" claim in public API**
The module-level and function-level doc comments repeatedly advertise low memory usage ("low memory footprint", "reads the file incrementally via an iterator"). However, `encrypt_file_to_chunks` collects **all** encrypted chunks into a `Vec<(XorName, Bytes)>` before returning. For a large file (e.g., 4 GB split into ~1000 × 4 MiB chunks), this means roughly the full encrypted size of the file is held in memory simultaneously before any upload begins.
The docs should accurately reflect this:
```suggestion
/// Encrypt a file using streaming self-encryption and upload each chunk.
///
/// Uses `stream_encrypt()` which reads the file incrementally via an iterator.
/// Note: all encrypted chunks are collected in memory before uploading begins,
/// so peak memory usage is proportional to the encrypted file size.
///
/// Returns the `DataMap` after all chunks are uploaded, plus the list of
/// transaction hash strings from payment.
```
If true streaming upload (upload each chunk as soon as it is produced) is desired, the encryption and upload loops would need to be interleaved.
How can I resolve this? If you propose a fix, please make it concise.- Propagate I/O errors during file read instead of silently treating them as EOF - Remove DATAMAP_HEX from stdout in private mode (security leak) - Write decryption output to temp file, rename atomically on success - Fix misleading low memory footprint docs - Remove test-only functions from public API re-exports
|
@claude @greptile-apps please review again. Also, in private mode it's normal that the datamap is printed to the user. |
|
Claude finished @grumbach's task —— View job Code Review Complete
SummaryThis PR successfully integrates convergent encryption using the Key improvements since last review:
Status: APPROVED ✅All critical issues from the previous review have been addressed. The code is production-ready with only minor suggestions remaining. Addressed Issues from Previous Review✅ Fixed: I/O Error Handling (src/client/self_encrypt.rs:88-117)Previous issue: Silent error swallowing in file reading iterator. Resolution: Now properly implemented with Arc<Mutex<Optionio::Error>> to capture errors during iteration and check them after encryption completes. This ensures I/O errors are not silently ignored. let io_error: std::sync::Arc<std::sync::Mutex<Option<std::io::Error>>> =
std::sync::Arc::new(std::sync::Mutex::new(None));
// ... error is captured in iterator ...
// Check if an I/O error was captured during iteration
if let Ok(guard) = io_error.lock() {
if let Some(ref e) = *guard {
return Err(Error::Io(std::io::Error::new(e.kind(), e.to_string())));
}
}✅ Fixed: Atomic File Writes (src/client/self_encrypt.rs:180-200)Previous issue: Partial/corrupt output files left behind on decryption failure. Resolution: Now writes to a temp file first ( let tmp_path = parent.join(format!(".saorsa_decrypt_{}.tmp", std::process::id()));
// ... write to tmp_path ...
if let Err(e) = result {
let _ = std::fs::remove_file(&tmp_path);
return Err(e);
}
std::fs::rename(&tmp_path, output_path).map_err(Error::Io)?;This same pattern is also applied in the local decryption helper (lines 318-336). ✅ Fixed: Documentation Accuracy (src/client/self_encrypt.rs:1-36)Previous issue: Misleading claims about "low memory footprint" when chunks are collected in memory. Resolution: Documentation now accurately states:
CLAUDE.md Compliance ✅NO production code violations found:
Examples of proper error handling:
Remaining Minor Suggestions (Non-blocking)1. Git Branch Dependency (Cargo.toml:57)Issue: Branch name self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" }Suggestions:
2. Private Mode DataMap Output (main.rs:151)User clarification noted: You mentioned "in private mode it's normal that the datamap is printed to the user." Current implementation (lines 151-156): println!("DATAMAP_FILE={}", datamap_path.display());
println!("MODE=private");
println!("CHUNKS={chunk_count}");
println!("TOTAL_SIZE={file_size}");
println!("PAYMENTS={total_tx_count}");
println!("TX_HASHES={}", all_tx_hashes.join(","));Observation: The current code does NOT print the DataMap hex, only the file path. This is the correct and secure approach. The previous review's concern was about a Status: ✅ Secure as implemented - Only the file path is printed, not the actual DataMap content. 3. DataMap Serialization Versioning (self_encrypt.rs:206-215)Suggestion: Consider adding a version byte to the serialized DataMap format for future compatibility: pub fn serialize_data_map(data_map: &DataMap) -> Result<Vec<u8>> {
let mut bytes = vec![1u8]; // Version 1
bytes.extend(data_map.to_bytes()
.map_err(|e| Error::Serialization(format!("Failed to serialize DataMap: {e}")))?);
Ok(bytes)
}Priority: Low - can be added when breaking changes are needed 4. Chunk Upload Parallelization (self_encrypt.rs:61-72)Current: Sequential upload of encrypted chunks // Future enhancement - parallel upload with concurrency limit
use futures::stream::{self, StreamExt};
let results = stream::iter(collected)
.map(|(hash, content)| async move {
client.put_chunk_with_payment(content).await.map(|(addr, tx)| (hash, addr, tx))
})
.buffer_unordered(4) // 4 concurrent uploads
.collect::<Vec<_>>()
.await;Priority: Low - current implementation is correct, this is just an optimization 5. Magic Constants (self_encrypt.rs:88, 93)Suggestion: Define buffer size as a constant: const READ_BUFFER_SIZE: usize = 64 * 1024; // 64KB
// ...
let mut buf = vec![0u8; READ_BUFFER_SIZE];Priority: Low - improves readability Security Analysis ✅Strengths:
Documented trade-offs:
Test Coverage ✅Excellent test suite with 17 comprehensive tests:
Coverage highlights:
Code Quality ✅Strengths:
Examples:
Performance ConsiderationsCurrent trade-offs:
Recommendation: Current implementation prioritizes correctness and simplicity. Performance optimizations can be added in future PRs based on real-world usage patterns. Files Changed Analysis.cargo/config.toml (NEW) ✅
Cargo.toml ✅
src/client/self_encrypt.rs (NEW) ✅
src/bin/saorsa-cli/main.rs ✅
src/bin/saorsa-cli/cli.rs ✅
src/client/mod.rs, src/lib.rs, src/storage/handler.rs ✅
RecommendationsBefore merge: ✅ Ready to merge as-isAll critical issues have been addressed. The following are optional improvements:
Future enhancements (separate PRs):
Final Verdict: APPROVED ✅This is high-quality, production-ready code with:
Great work addressing the review feedback! The I/O error handling, atomic file writes, and documentation improvements show careful attention to the concerns raised. The implementation is now robust and ready for production use. Job Run | Branch: |
src/client/self_encrypt.rs
Outdated
| // Write to a temp file first, then rename atomically on success | ||
| // to prevent leaving a corrupt partial file on failure. | ||
| let parent = output_path.parent().unwrap_or_else(|| Path::new(".")); | ||
| let tmp_path = parent.join(format!(".saorsa_decrypt_{}.tmp", std::process::id())); |
There was a problem hiding this comment.
Temp file path not unique within a process
Both download_and_decrypt_file (here) and decrypt_from_store (line 319) use only std::process::id() to disambiguate the temp file. If two download operations run concurrently in the same process (which is entirely plausible since this is a pub async fn), they will race to write the same .saorsa_decrypt_<pid>.tmp path. The second File::create call will silently truncate the first download's in-progress temp file, corrupting it.
Consider adding a per-call unique component:
use std::sync::atomic::{AtomicU64, Ordering};
static DECRYPT_COUNTER: AtomicU64 = AtomicU64::new(0);
let tmp_path = parent.join(format!(
".saorsa_decrypt_{}_{}.tmp",
std::process::id(),
DECRYPT_COUNTER.fetch_add(1, Ordering::Relaxed)
));Or use tempfile::Builder::new().tempfile_in(parent) which handles uniqueness atomically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/client/self_encrypt.rs
Line: 183
Comment:
**Temp file path not unique within a process**
Both `download_and_decrypt_file` (here) and `decrypt_from_store` (line 319) use only `std::process::id()` to disambiguate the temp file. If two download operations run concurrently in the same process (which is entirely plausible since this is a `pub async fn`), they will race to write the same `.saorsa_decrypt_<pid>.tmp` path. The second `File::create` call will silently truncate the first download's in-progress temp file, corrupting it.
Consider adding a per-call unique component:
```rust
use std::sync::atomic::{AtomicU64, Ordering};
static DECRYPT_COUNTER: AtomicU64 = AtomicU64::new(0);
let tmp_path = parent.join(format!(
".saorsa_decrypt_{}_{}.tmp",
std::process::id(),
DECRYPT_COUNTER.fetch_add(1, Ordering::Relaxed)
));
```
Or use `tempfile::Builder::new().tempfile_in(parent)` which handles uniqueness atomically.
How can I resolve this? If you propose a fix, please make it concise.
src/client/self_encrypt.rs
Outdated
| let addr = hash.0; | ||
| let addr_hex = hex::encode(addr); | ||
| debug!("Fetching chunk idx={idx} ({addr_hex})"); | ||
| let chunk = tokio::task::block_in_place(|| handle.block_on(client.get_chunk(&addr))) |
There was a problem hiding this comment.
block_in_place panics on the current_thread runtime
tokio::task::block_in_place is only valid on a multi-threaded Tokio runtime. If a caller uses #[tokio::main(flavor = "current_thread")] or tokio::runtime::Builder::new_current_thread(), this will panic at runtime with "Cannot call block_in_place outside of a multi-threaded runtime".
Since download_and_decrypt_file is part of the public library API (pub async fn), library consumers have no warning of this constraint. Consider documenting it explicitly in the doc comment, e.g.:
/// # Panics
///
/// Panics if called from a Tokio `current_thread` runtime. This function uses
/// `block_in_place` internally and requires the multi-threaded runtime scheduler.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/client/self_encrypt.rs
Line: 163
Comment:
**`block_in_place` panics on the `current_thread` runtime**
`tokio::task::block_in_place` is only valid on a multi-threaded Tokio runtime. If a caller uses `#[tokio::main(flavor = "current_thread")]` or `tokio::runtime::Builder::new_current_thread()`, this will panic at runtime with "Cannot call `block_in_place` outside of a multi-threaded runtime".
Since `download_and_decrypt_file` is part of the public library API (`pub async fn`), library consumers have no warning of this constraint. Consider documenting it explicitly in the doc comment, e.g.:
```rust
/// # Panics
///
/// Panics if called from a Tokio `current_thread` runtime. This function uses
/// `block_in_place` internally and requires the multi-threaded runtime scheduler.
```
How can I resolve this? If you propose a fix, please make it concise.| hkdf = "0.12" | ||
|
|
||
| # Self-encryption (convergent encryption + streaming) | ||
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } |
There was a problem hiding this comment.
Unpinned git dependency — not reproducible and auditable
Using a branch reference without a pinned commit rev means every cargo update (or a fresh cargo fetch on a CI machine without a lock file) will silently fetch whatever the branch head points to at that moment. If the post_quatum branch is force-pushed, the build silently changes without any change to Cargo.toml.
For a security-critical dependency (convergent encryption), this is a supply-chain risk. Pin it to a specific commit:
self_encryption = { git = "https://github.com/grumbach/self_encryption.git", rev = "<commit-sha>" }Also confirm that the Cargo.lock is committed so that CI/CD reproduces the same build deterministically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: Cargo.toml
Line: 57
Comment:
**Unpinned git dependency — not reproducible and auditable**
Using a branch reference without a pinned commit `rev` means every `cargo update` (or a fresh `cargo fetch` on a CI machine without a lock file) will silently fetch whatever the branch head points to at that moment. If the `post_quatum` branch is force-pushed, the build silently changes without any change to `Cargo.toml`.
For a security-critical dependency (convergent encryption), this is a supply-chain risk. Pin it to a specific commit:
```toml
self_encryption = { git = "https://github.com/grumbach/self_encryption.git", rev = "<commit-sha>" }
```
Also confirm that the `Cargo.lock` is committed so that CI/CD reproduces the same build deterministically.
How can I resolve this? If you propose a fix, please make it concise.…drain uploads on failure - Move read_error check to after chunk iteration (not just after stream_encrypt) so mid-stream I/O failures are caught instead of silently truncating data - Add randomness to temp file names to prevent collisions in concurrent calls - Reject CurrentThread runtime explicitly instead of deadlocking on block_on - Drain in-flight upload futures on failure, collecting tx_hashes from succeeded uploads and logging them before returning the error - Collect tx_hashes on hash-mismatch path too (was previously discarded) - Log temp file cleanup errors instead of silently swallowing them
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Ok((address, tx_hashes)) => { | ||
| if address != hash.0 { | ||
| // Drain remaining in-flight futures before returning | ||
| let mut succeeded = all_tx_hashes.len(); | ||
| while let Some((_, _, res)) = in_flight.next().await { | ||
| if let Ok((_, txs)) = res { | ||
| all_tx_hashes.extend(txs.iter().map(|tx| format!("{tx:?}"))); | ||
| succeeded += 1; | ||
| } | ||
| } | ||
| if succeeded > 0 { | ||
| warn!( | ||
| "{succeeded} chunk(s) already uploaded before hash mismatch on chunk {num}; \ | ||
| tx_hashes so far: {all_tx_hashes:?}" | ||
| ); | ||
| } | ||
| return Err(Error::Crypto(format!( | ||
| "Hash mismatch for chunk {num}: self_encryption={} network={}", | ||
| hex::encode(hash.0), | ||
| hex::encode(address) | ||
| ))); | ||
| } |
There was a problem hiding this comment.
When an upload returns Ok((address, tx_hashes)) but the address doesn't match hash.0, the function returns early without adding that chunk's tx_hashes to all_tx_hashes. Since this is an error path where operators may need to recover/inspect payments, those tx hashes should still be captured (and ideally included in the warning/output) before returning the mismatch error.
src/client/self_encrypt.rs
Outdated
| } | ||
| return Err(e); | ||
| } | ||
|
|
There was a problem hiding this comment.
Both the mismatch and error paths use std::fs::rename(tmp, output) for the final move. On Windows, rename fails if output_path already exists (unlike std::fs::write, which overwrites). Consider handling the existing-destination case explicitly (remove first, or use a cross-platform atomic persist/replace strategy such as tempfile::NamedTempFile::persist / platform-specific replace).
| // On Windows, std::fs::rename fails if the destination already exists. | |
| // To provide consistent behavior across platforms, explicitly remove any | |
| // existing destination file before renaming the temporary file into place. | |
| if output_path.exists() { | |
| if let Err(e) = std::fs::remove_file(output_path) { | |
| warn!( | |
| "Failed to remove existing output file {}: {e}", | |
| output_path.display() | |
| ); | |
| return Err(Error::Io(e)); | |
| } | |
| } |
src/client/self_encrypt.rs
Outdated
| let data_iter = std::iter::from_fn(move || { | ||
| let mut buf = vec![0u8; 65536]; | ||
| match reader.read(&mut buf) { |
There was a problem hiding this comment.
The read iterator allocates a new 64KiB Vec on every next() call. For large files this causes a lot of heap churn and can become a measurable bottleneck. Consider reusing a buffer (e.g., keep a Vec<u8> outside the closure and return Bytes::copy_from_slice(&buf[..n]), or switch to a reader->chunk iterator that avoids per-iteration allocations).
| if address != hash.0 { | ||
| // Drain remaining in-flight futures before returning | ||
| let mut succeeded = all_tx_hashes.len(); | ||
| while let Some((_, _, res)) = in_flight.next().await { | ||
| if let Ok((_, txs)) = res { | ||
| all_tx_hashes.extend(txs.iter().map(|tx| format!("{tx:?}"))); | ||
| succeeded += 1; | ||
| } |
There was a problem hiding this comment.
In the hash-mismatch path, succeeded is initialized with all_tx_hashes.len(), but that vector counts tx hashes (potentially multiple per chunk), not successfully uploaded chunks. This makes the warning misleading. Track successful chunk uploads separately (e.g., a successful_chunks counter) and keep all_tx_hashes purely for tx hash collection.
mickvandijke
left a comment
There was a problem hiding this comment.
Really nice work on this — the streaming self-encryption integration is well-structured, the bounded-concurrency upload loop is a solid design, and the public/private data mode split is clean. The error-handling discipline (draining in-flight futures to preserve tx hashes on failure, atomic temp-file writes) shows good attention to production edge cases. The test suite is thorough too.
A few things I noticed below, mostly around deduplication and minor polish:
.cargo/config.toml
Outdated
| # | ||
| # Value: 4 MiB - 4 KiB = 4,190,208 bytes (4 KiB headroom). | ||
| [env] | ||
| MAX_CHUNK_SIZE = "4190208" |
There was a problem hiding this comment.
Does this env var actually do anything?
This sets MAX_CHUNK_SIZE as a Cargo environment variable, but saorsa_node::ant_protocol::MAX_CHUNK_SIZE is a Rust const — it's not read via env!() or std::env::var(). This config only has an effect if the self_encryption crate itself reads MAX_CHUNK_SIZE from the environment at build time or runtime.
Could you verify that the self_encryption crate actually consumes this? If not, this file is dead config giving a false sense of safety — the plaintext chunk size wouldn't actually be constrained.
src/client/self_encrypt.rs
Outdated
| Ok((address, tx_hashes)) => { | ||
| if address != hash.0 { | ||
| // Drain remaining in-flight futures before returning | ||
| let mut succeeded = all_tx_hashes.len(); |
There was a problem hiding this comment.
succeeded counts tx hashes, not chunks
all_tx_hashes.len() gives the number of transaction hashes accumulated so far, which can be multiple per chunk. But the warning message says "{succeeded} chunk(s) already uploaded". This will overcount and produce misleading diagnostics.
Same issue on line 150.
Consider tracking a separate uploaded_chunks counter that increments once per successful upload result, independent of how many tx hashes each upload returns.
src/client/self_encrypt.rs
Outdated
| let read_error: Arc<Mutex<Option<std::io::Error>>> = Arc::new(Mutex::new(None)); | ||
| let read_error_writer = Arc::clone(&read_error); | ||
| let data_iter = std::iter::from_fn(move || { | ||
| let mut buf = vec![0u8; 65536]; |
There was a problem hiding this comment.
Magic number + per-call allocation
Two small things here:
65536is a magic number — consider extracting it to a named constant likeconst READ_BUFFER_SIZE: usize = 64 * 1024;to make the intent clear.- This allocates a fresh 64 KiB
Vecon everynext()call. For a large file that's thousands of allocations and a lot of heap churn. You could reuse a buffer across calls (e.g., keep aVec<u8>outside the closure andBytes::copy_from_slice(&buf[..n])each iteration), or useBytesMutwith a fixed capacity.
|
|
||
| /// Encrypt a file from disk using `stream_encrypt`, returning the `DataMap` | ||
| /// and a list of `(XorName, Bytes)` encrypted chunks. | ||
| fn encrypt_file_to_chunks( |
There was a problem hiding this comment.
Near-identical code to encrypt_and_upload_file (lines 60–82)
The file-open, BufReader, Arc<Mutex<Option<io::Error>>>, data_iter construction, stream_encrypt call, and post-iteration error check here are ~30 lines copy-pasted from encrypt_and_upload_file. If the error-handling pattern changes (e.g., the I/O error propagation gets refined), it would need updating in both places.
Consider extracting a shared helper like:
fn open_encrypt_stream(file_path: &Path, file_size: usize)
-> Result<(EncryptStream, Arc<Mutex<Option<std::io::Error>>>)>that both callers use.
| } | ||
|
|
||
| /// Shared helper: decrypt a `DataMap` using a chunk store `HashMap`. | ||
| fn decrypt_from_store<S: BuildHasher>( |
There was a problem hiding this comment.
Temp-file-write-and-rename logic duplicated from download_and_decrypt_file (lines 306–338)
This entire block — create tmp path, write chunks in a closure, cleanup on error, rename on success — is a verbatim copy of the same pattern ~130 lines above. Both even use the same .saorsa_decrypt_ prefix.
A shared helper like:
fn write_stream_to_file(
stream: impl Iterator<Item = Result<Bytes, self_encryption::Error>>,
output_path: &Path,
) -> Result<()>would eliminate the duplication and ensure any future fixes (e.g., Windows rename handling) only need to be made once.
src/client/self_encrypt.rs
Outdated
| return Err(e); | ||
| } | ||
|
|
||
| std::fs::rename(&tmp_path, output_path).map_err(Error::Io)?; |
There was a problem hiding this comment.
std::fs::rename fails on Windows if destination exists
On Windows, std::fs::rename returns an error if output_path already exists (unlike Unix where it atomically replaces). Since this is a cross-platform library (CI builds and tests on Windows), consider either removing the destination first or using tempfile::NamedTempFile::persist() which handles this portably. Same applies to the identical rename on line 480.
src/client/self_encrypt.rs
Outdated
| // Write to a temp file first, then rename atomically on success | ||
| // to prevent leaving a corrupt partial file on failure. | ||
| let parent = output_path.parent().unwrap_or_else(|| Path::new(".")); | ||
| let unique: u64 = rand::random(); |
There was a problem hiding this comment.
Consider using tempfile instead of rand::random for tmp files
tempfile is already a dev-dependency (used in tests). Promoting it to a regular dependency and using tempfile::Builder::new().prefix(".saorsa_decrypt_").tempfile_in(parent)? would give you guaranteed-unique temp paths with atomic creation, and NamedTempFile::persist() handles the cross-platform rename issue too. This would replace the manual rand::random() uniqueness, the manual cleanup on error, and the manual rename — all in one.
Same on line 454.
src/lib.rs
Outdated
| fetch_data_map_public, serialize_data_map, store_data_map_public, | ||
| }; | ||
| pub use client::{ | ||
| compute_address, create_manifest, deserialize_manifest, peer_id_to_xor_name, reassemble_file, |
There was a problem hiding this comment.
Are the old file_ops exports still needed?
create_manifest, deserialize_manifest, reassemble_file, serialize_manifest, split_file — these are the plaintext chunking functions that self-encryption replaces. If they're still needed for backward compatibility with data already stored on the network, a comment noting that would be helpful. If not, consider deprecating or removing them to avoid confusion about which path to use.
| ChunkGetRequest, ChunkGetResponse, ChunkMessage, ChunkMessageBody, ChunkPutRequest, | ||
| ChunkPutResponse, ChunkQuoteRequest, ChunkQuoteResponse, CHUNK_PROTOCOL_ID, MAX_CHUNK_SIZE, | ||
| }; | ||
| pub use client::self_encrypt::{ |
There was a problem hiding this comment.
High: Encrypted-chunk headroom fix doesn't apply to downstream library consumers
The reduced plaintext chunk size (4 MiB - 4 KiB) lives only in .cargo/config.toml, which sets a Cargo env var scoped to this workspace. But this PR publicly re-exports the self-encryption API right here.
I verified by building saorsa-node as a dependency from a clean temp crate outside the repo root — self_encryption::MAX_CHUNK_SIZE resolved to 4194304 (full 4 MiB), not 4190208. So downstream library consumers calling encrypt_and_upload_file can still produce encrypted chunks at the full 4 MiB plaintext size and hit oversize uploads against Saorsa's 4 MiB chunk limit.
The headroom constraint needs to be enforced at the library level (e.g., a runtime assertion, a wrapper constant, or configuring self_encryption programmatically) rather than relying on a workspace-local env var that disappears when the crate is consumed externally.
There was a problem hiding this comment.
100% let's move this to self_encryption as a default
- Remove dead .cargo/config.toml (MAX_CHUNK_SIZE env var had no effect) - Extract open_encrypt_stream() to deduplicate file-read + encrypt setup - Extract write_stream_to_file() to deduplicate atomic-write-and-rename - Fix succeeded counter: track uploaded chunks, not tx hashes - Capture tx_hashes before hash-mismatch early return - Add READ_BUFFER_SIZE constant, reuse buffer across iterator calls - Handle Windows rename-over-existing-file edge case - Pin self_encryption dep to immutable commit rev - Add backward-compat comment on legacy file_ops re-exports
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let handle = Handle::current(); | ||
|
|
||
| let stream = self_encryption::streaming_decrypt(data_map, |batch: &[(usize, XorName)]| { | ||
| let batch_owned: Vec<(usize, XorName)> = batch.to_vec(); | ||
|
|
||
| // block_in_place panics on current_thread runtime, and handle.block_on | ||
| // deadlocks there. Reject unsupported runtime flavors explicitly. | ||
| if handle.runtime_flavor() == tokio::runtime::RuntimeFlavor::CurrentThread { | ||
| return Err(self_encryption::Error::Generic( | ||
| "download_and_decrypt_file requires a multi_thread tokio runtime".into(), | ||
| )); | ||
| } | ||
| tokio::task::block_in_place(|| { | ||
| handle.block_on(async { | ||
| let mut futs = FuturesUnordered::new(); | ||
| for (idx, hash) in batch_owned { | ||
| let addr = hash.0; | ||
| futs.push(async move { | ||
| let result = client.get_chunk(&addr).await; | ||
| (idx, hash, result) | ||
| }); | ||
| } | ||
|
|
||
| let mut results = Vec::with_capacity(futs.len()); | ||
| while let Some((idx, hash, result)) = futs.next().await { | ||
| let addr_hex = hex::encode(hash.0); | ||
| let chunk = result | ||
| .map_err(|e| { | ||
| self_encryption::Error::Generic(format!( | ||
| "Network fetch failed for {addr_hex}: {e}" | ||
| )) | ||
| })? | ||
| .ok_or_else(|| { | ||
| self_encryption::Error::Generic(format!( | ||
| "Chunk not found on network: {addr_hex}" | ||
| )) | ||
| })?; | ||
| results.push((idx, chunk.content)); | ||
| } | ||
| Ok(results) | ||
| }) | ||
| }) | ||
| }) | ||
| .map_err(|e| Error::Crypto(format!("Decryption failed: {e}")))?; | ||
|
|
||
| write_stream_to_file(stream, output_path)?; | ||
|
|
There was a problem hiding this comment.
download_and_decrypt_file calls write_stream_to_file (which uses std::fs + blocking write_all) from within an async fn. This will block a Tokio worker thread while writing/decrypting, and the decrypt iterator itself may also trigger blocking block_in_place fetches. Consider wrapping the whole write/decrypt loop in tokio::task::spawn_blocking (or switching to tokio::fs/AsyncWrite) so async tasks aren’t starved during large downloads.
| let handle = Handle::current(); | |
| let stream = self_encryption::streaming_decrypt(data_map, |batch: &[(usize, XorName)]| { | |
| let batch_owned: Vec<(usize, XorName)> = batch.to_vec(); | |
| // block_in_place panics on current_thread runtime, and handle.block_on | |
| // deadlocks there. Reject unsupported runtime flavors explicitly. | |
| if handle.runtime_flavor() == tokio::runtime::RuntimeFlavor::CurrentThread { | |
| return Err(self_encryption::Error::Generic( | |
| "download_and_decrypt_file requires a multi_thread tokio runtime".into(), | |
| )); | |
| } | |
| tokio::task::block_in_place(|| { | |
| handle.block_on(async { | |
| let mut futs = FuturesUnordered::new(); | |
| for (idx, hash) in batch_owned { | |
| let addr = hash.0; | |
| futs.push(async move { | |
| let result = client.get_chunk(&addr).await; | |
| (idx, hash, result) | |
| }); | |
| } | |
| let mut results = Vec::with_capacity(futs.len()); | |
| while let Some((idx, hash, result)) = futs.next().await { | |
| let addr_hex = hex::encode(hash.0); | |
| let chunk = result | |
| .map_err(|e| { | |
| self_encryption::Error::Generic(format!( | |
| "Network fetch failed for {addr_hex}: {e}" | |
| )) | |
| })? | |
| .ok_or_else(|| { | |
| self_encryption::Error::Generic(format!( | |
| "Chunk not found on network: {addr_hex}" | |
| )) | |
| })?; | |
| results.push((idx, chunk.content)); | |
| } | |
| Ok(results) | |
| }) | |
| }) | |
| }) | |
| .map_err(|e| Error::Crypto(format!("Decryption failed: {e}")))?; | |
| write_stream_to_file(stream, output_path)?; | |
| // Clone/copy inputs so we can move them into a blocking task safely. | |
| let data_map_cloned = data_map.clone(); | |
| let output_path_buf = output_path.to_path_buf(); | |
| let client_cloned = client.clone(); | |
| let handle = Handle::current(); | |
| tokio::task::spawn_blocking(move || -> Result<()> { | |
| let stream = | |
| self_encryption::streaming_decrypt(&data_map_cloned, |batch: &[(usize, XorName)]| { | |
| let batch_owned: Vec<(usize, XorName)> = batch.to_vec(); | |
| // Reject unsupported runtime flavors explicitly. | |
| if handle.runtime_flavor() == tokio::runtime::RuntimeFlavor::CurrentThread { | |
| return Err(self_encryption::Error::Generic( | |
| "download_and_decrypt_file requires a multi_thread tokio runtime".into(), | |
| )); | |
| } | |
| handle.block_on(async { | |
| let mut futs = FuturesUnordered::new(); | |
| for (idx, hash) in batch_owned { | |
| let addr = hash.0; | |
| let client = client_cloned.clone(); | |
| futs.push(async move { | |
| let result = client.get_chunk(&addr).await; | |
| (idx, hash, result) | |
| }); | |
| } | |
| let mut results = Vec::with_capacity(futs.len()); | |
| while let Some((idx, hash, result)) = futs.next().await { | |
| let addr_hex = hex::encode(hash.0); | |
| let chunk = result | |
| .map_err(|e| { | |
| self_encryption::Error::Generic(format!( | |
| "Network fetch failed for {addr_hex}: {e}" | |
| )) | |
| })? | |
| .ok_or_else(|| { | |
| self_encryption::Error::Generic(format!( | |
| "Chunk not found on network: {addr_hex}" | |
| )) | |
| })?; | |
| results.push((idx, chunk.content)); | |
| } | |
| Ok(results) | |
| }) | |
| }) | |
| .map_err(|e| Error::Crypto(format!("Decryption failed: {e}")))?; | |
| write_stream_to_file(stream, &output_path_buf)?; | |
| Ok(()) | |
| }) | |
| .await | |
| .map_err(|e| Error::Crypto(format!("Blocking task join error: {e}")))??; |
| # Branch name "post_quatum" is intentional (matches the upstream branch). | ||
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } |
There was a problem hiding this comment.
self_encryption is pulled from a moving git branch. That makes builds non-reproducible and can break CI unexpectedly when the branch advances. Prefer pinning to an exact rev (and optionally documenting/updating it intentionally) so dependency updates are explicit and auditable.
| # Branch name "post_quatum" is intentional (matches the upstream branch). | |
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" } | |
| # Pinned commit from upstream "post_quatum" branch; update intentionally as needed. | |
| self_encryption = { git = "https://github.com/grumbach/self_encryption.git", rev = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" } |
| if let Ok(guard) = read_error.lock() { | ||
| if let Some(ref e) = *guard { | ||
| return Err(Error::Io(std::io::Error::new(e.kind(), format!("{e}")))); | ||
| } | ||
| } |
There was a problem hiding this comment.
check_read_error silently ignores a poisoned Mutex (or any lock failure) and returns Ok(()), which can mask an underlying I/O failure during streaming reads. Consider treating PoisonError as an error (or using lock().map(|g| ...).unwrap_or_else(|e| e.into_inner())), so read failures can’t be dropped on the floor.
| if let Ok(guard) = read_error.lock() { | |
| if let Some(ref e) = *guard { | |
| return Err(Error::Io(std::io::Error::new(e.kind(), format!("{e}")))); | |
| } | |
| } | |
| // Use the inner guard even if the mutex was poisoned, so any recorded | |
| // read error is still observed and not silently dropped. | |
| let guard = read_error.lock().unwrap_or_else(|e| e.into_inner()); | |
| if let Some(ref e) = *guard { | |
| return Err(Error::Io(std::io::Error::new(e.kind(), format!("{e}")))); | |
| } |
| let metadata = std::fs::metadata(file_path).map_err(Error::Io)?; | ||
| let file_size: usize = metadata | ||
| .len() | ||
| .try_into() | ||
| .map_err(|_| Error::Crypto("File too large for this platform".into()))?; | ||
| info!( | ||
| "Encrypting file: {} ({file_size} bytes)", | ||
| file_path.display() | ||
| ); | ||
|
|
||
| let (mut stream, read_error) = open_encrypt_stream(file_path, file_size)?; | ||
|
|
||
| let mut all_tx_hashes: Vec<String> = Vec::new(); | ||
| let mut chunk_num: usize = 0; | ||
| let mut uploaded_chunks: usize = 0; | ||
|
|
||
| { | ||
| let mut in_flight = FuturesUnordered::new(); | ||
| let mut chunks_iter = stream.chunks(); | ||
| let mut iter_exhausted = false; | ||
|
|
||
| loop { | ||
| // Fill up to UPLOAD_CONCURRENCY uploads | ||
| while !iter_exhausted && in_flight.len() < UPLOAD_CONCURRENCY { | ||
| match chunks_iter.next() { | ||
| Some(chunk_result) => { |
There was a problem hiding this comment.
encrypt_and_upload_file is async but performs synchronous filesystem reads and runs encryption work directly on the async executor thread via stream.chunks() / chunks_iter.next(). For large files this can block Tokio worker threads and reduce network throughput. Consider moving the file read + stream_encrypt iteration onto spawn_blocking (or a dedicated blocking thread) and feeding encrypted chunks to the async uploader via a channel.
|
@claude @greptile last review |
|
Claude finished @grumbach's task —— View job Final Code Review Complete
SummaryThis PR successfully integrates convergent encryption using the Status: APPROVED ✅ with one minor clarification note. What Changed Since Previous ReviewsThe team has done outstanding work addressing all previous concerns: ✅ Fixed: I/O Error Propagation (src/client/self_encrypt.rs:53-84)
✅ Fixed: Atomic File Writes (src/client/self_encrypt.rs:86-128)
✅ Fixed: Windows Compatibility (lines 121-124)
✅ Fixed: Concurrent Temp File Collisions (lines 95-99)
✅ Fixed: Documentation Accuracy (lines 1-13, 130-141)
✅ Fixed: Git Dependency Clarification (Cargo.toml:57-58)
✅ Added: Bounded Concurrency Upload (lines 163-243)
✅ Added: Transaction Hash Preservation (lines 201-240)
✅ Added: Runtime Flavor Check (lines 310-316)
CLAUDE.md Compliance ✅Perfect compliance—no violations found:
Code Quality AssessmentArchitecture ✅
Error Handling ✅
Performance ✅
Security ✅
Test Coverage ✅Exceptional test suite with 17 comprehensive tests:
Coverage highlights:
Review Comments AnalysisAddressed from claude's review:
Addressed from greptile-apps:
Addressed from copilot-pull-request-reviewer:
Addressed from mickvandijke:
Outstanding review comment: Git Dependency Pin (Cargo.toml:57-58)The most recent commit (a032af4) reverted the pinned Current state: # Branch name "post_quatum" is intentional (matches the upstream branch).
self_encryption = { git = "https://github.com/grumbach/self_encryption.git", branch = "post_quatum" }Trade-off:
Recommendation: This is acceptable for current development but should be pinned to a specific For production (future PR): # Pinned to specific commit for reproducibility
self_encryption = { git = "https://github.com/grumbach/self_encryption.git", rev = "abc123..." }Minor Observations (Non-blocking)1. DataMap Serialization Versioning (lines 356-375)Current: Raw pub fn serialize_data_map(data_map: &DataMap) -> Result<Vec<u8>> {
let mut bytes = vec![1u8]; // Version 1
bytes.extend(data_map.to_bytes()?);
Ok(bytes)
}Priority: Low—can be added when needed 2. Chunk Upload Parallelization (lines 163-243)Current: Already implemented with 3. Network Failure Testing (test module)Current: Comprehensive local testing Files Changed Review✅ src/client/self_encrypt.rs (NEW, 819 lines)
✅ Cargo.toml (+4 lines)
✅ src/bin/saorsa-cli/main.rs (+112/-115 lines)
✅ src/bin/saorsa-cli/cli.rs (+70/-2 lines)
✅ src/client/mod.rs (+1 line)
✅ src/lib.rs (+6 lines)
✅ src/storage/handler.rs (+5/-4 lines)
Final Verdict: APPROVED ✅This PR is production-ready with:
Recommendations:
This is exemplary work. The team systematically addressed every review concern with thoughtful solutions. The I/O error handling, atomic file writes, bounded concurrency, and transaction hash preservation show excellent attention to production edge cases. The test suite is comprehensive and security-focused. Ready to merge! 🚀 Job Run | Branch: |
| let datamap_path = path.with_extension("datamap"); | ||
| std::fs::write(&datamap_path, &data_map_bytes)?; |
There was a problem hiding this comment.
Silent overwrite of existing .datamap destroys access to previous private uploads
std::fs::write silently overwrites any pre-existing .datamap file at that path. For a private upload, the .datamap is the sole decryption capability — losing it is permanent data loss. If a user uploads the same file a second time (e.g., to a different network epoch or after paying again), the first upload becomes irrecoverable without any warning.
This is especially dangerous because the path is derived mechanically from the input path: path.with_extension("datamap"). The user has no indication that a pre-existing .datamap file will be destroyed.
At minimum, check for a pre-existing file and fail with an actionable error:
if datamap_path.exists() {
return Err(color_eyre::eyre::eyre!(
"DataMap file already exists at {}. Move or delete it before re-uploading.",
datamap_path.display()
));
}
std::fs::write(&datamap_path, &data_map_bytes)?;Prompt To Fix With AI
This is a comment left during a code review.
Path: src/bin/saorsa-cli/main.rs
Line: 148-149
Comment:
**Silent overwrite of existing `.datamap` destroys access to previous private uploads**
`std::fs::write` silently overwrites any pre-existing `.datamap` file at that path. For a private upload, the `.datamap` is the *sole decryption capability* — losing it is permanent data loss. If a user uploads the same file a second time (e.g., to a different network epoch or after paying again), the first upload becomes irrecoverable without any warning.
This is especially dangerous because the path is derived mechanically from the input path: `path.with_extension("datamap")`. The user has no indication that a pre-existing `.datamap` file will be destroyed.
At minimum, check for a pre-existing file and fail with an actionable error:
```rust
if datamap_path.exists() {
return Err(color_eyre::eyre::eyre!(
"DataMap file already exists at {}. Move or delete it before re-uploading.",
datamap_path.display()
));
}
std::fs::write(&datamap_path, &data_map_bytes)?;
```
How can I resolve this? If you propose a fix, please make it concise.Self-encryption fully replaces the legacy FileManifest/split_file API. Remove file_ops module, its exports from mod.rs and lib.rs, and update CLI help text to no longer reference manifest addresses.
mickvandijke
left a comment
There was a problem hiding this comment.
Review of final state (commit e916936). Five items — two correctness bugs, three worth discussing.
| if let Some(ref e) = *guard { | ||
| return Err(Error::Io(std::io::Error::new(e.kind(), format!("{e}")))); | ||
| } | ||
| } |
There was a problem hiding this comment.
Poisoned mutex silently drops I/O error
If the read iterator panics (poisoning the mutex), lock() returns Err and this function returns Ok(()) — silently swallowing whatever I/O error was captured. This means a panicking read path produces a successful result with corrupt/truncated data.
| } | |
| let guard = read_error.lock().unwrap_or_else(|e| e.into_inner()); | |
| if let Some(ref e) = *guard { | |
| return Err(Error::Io(std::io::Error::new(e.kind(), format!("{e}")))); | |
| } | |
| Ok(()) |
unwrap_or_else(|e| e.into_inner()) recovers the guard even when poisoned, so the captured error is always checked.
| if output_path.exists() { | ||
| std::fs::remove_file(output_path).map_err(Error::Io)?; | ||
| } | ||
| std::fs::rename(&tmp_path, output_path).map_err(Error::Io)?; |
There was a problem hiding this comment.
TOCTOU race in Windows rename workaround
exists() + remove_file() is not atomic — another process (or concurrent decrypt) could create a file at output_path between the two calls, causing the subsequent rename to fail on Windows anyway.
tempfile is already a regular dependency (used for rand in the tmp name). Consider using tempfile::NamedTempFile with .persist() which handles the cross-platform atomic-replace correctly, and would also eliminate the manual cleanup logic above.
| fn encrypt_file_to_chunks( | ||
| file_path: &Path, | ||
| file_size: usize, | ||
| ) -> Result<(DataMap, Vec<(XorName, Bytes)>)> { |
There was a problem hiding this comment.
All encrypted chunks collected in memory
encrypt_and_upload_file now streams with bounded concurrency, but this function (used by the pub encrypt_file_local) still collects every chunk into a Vec. For a large file this means the full encrypted size in RAM.
Since encrypt_file_local is public API, library consumers could hit this unknowingly. At minimum, add a doc note on encrypt_file_local warning about peak memory being proportional to file size. Longer term, consider a callback/visitor pattern so callers can process chunks incrementally.
| self_encryption::Error::Generic(format!( | ||
| "Network fetch failed for {addr_hex}: {e}" | ||
| )) | ||
| })? |
There was a problem hiding this comment.
Early ? abandons remaining in-flight fetch futures
If one chunk fetch fails in the while let Some(...) loop, the ? on the result drops the FuturesUnordered with remaining futures still running. Those futures continue executing in the background, consuming network resources and Tokio worker time for chunks that will never be used.
Not a correctness issue, but for large batches it wastes resources. Consider collecting all results first and then checking for errors, similar to how encrypt_and_upload_file drains in-flight uploads on failure.
| } else { | ||
| // Private mode: save DataMap locally, never upload it | ||
| let data_map_bytes = serialize_data_map(&data_map)?; | ||
| let datamap_path = path.with_extension("datamap"); |
There was a problem hiding this comment.
DataMap (decryption key) written to predictable path without safeguards
The datamap is the sole capability needed to decrypt the file. Two concerns:
-
Silent overwrite: If
secret.datamapalready exists (e.g., from a previous upload of the same filename), it is silently overwritten. The user loses their previous decryption key with no warning. -
Predictable location: Anyone who knows the original filename knows where the datamap is. For a feature explicitly called "private mode", the key sitting next to the original file in a guessable path weakens the security posture.
Suggestion: warn (or error) if the datamap file already exists, and consider printing a reminder that this file must be kept secret.
- Remove dead .cargo/config.toml (MAX_CHUNK_SIZE env var had no effect) - Extract open_encrypt_stream() to deduplicate file-read + encrypt setup - Extract write_stream_to_file() to deduplicate atomic-write-and-rename - Fix succeeded counter: track uploaded chunks, not tx hashes - Capture tx_hashes before hash-mismatch early return - Add READ_BUFFER_SIZE constant, reuse buffer across iterator calls - Handle Windows rename-over-existing-file edge case - Pin self_encryption dep to immutable commit rev - Add backward-compat comment on legacy file_ops re-exports
- Remove dead .cargo/config.toml (MAX_CHUNK_SIZE env var had no effect) - Extract open_encrypt_stream() to deduplicate file-read + encrypt setup - Extract write_stream_to_file() to deduplicate atomic-write-and-rename - Fix succeeded counter: track uploaded chunks, not tx hashes - Capture tx_hashes before hash-mismatch early return - Add READ_BUFFER_SIZE constant, reuse buffer across iterator calls - Handle Windows rename-over-existing-file edge case - Pin self_encryption dep to immutable commit rev - Add backward-compat comment on legacy file_ops re-exports
Add unit and e2e tests covering the remaining Section 18 scenarios: Unit tests (32 new): - Quorum: #4 fail→abandoned, #16 timeout→inconclusive, #27 single-round dual-evidence, #28 dynamic threshold undersized, #33 batched per-key, #34 partial response unresolved, #42 quorum-derived paid-list auth - Admission: #5 unauthorized peer, #7 out-of-range rejected - Config: #18 invalid config rejected, #26 dynamic paid threshold - Scheduling: #8 dedup safety, #8 replica/paid collapse - Neighbor sync: #35 round-robin cooldown skip, #36 cycle completion, #38 snapshot stability mid-join, #39 unreachable removal + slot fill, #40 cooldown peer removed, #41 cycle termination guarantee, consecutive rounds, cycle preserves sync times - Pruning: #50 hysteresis prevents premature delete, #51 timestamp reset on heal, #52 paid/record timestamps independent, #23 entry removal - Audit: #19/#53 partial failure mixed responsibility, #54 all pass, #55 empty failure discard, #56 repair opportunity filter, response count validation, digest uses full record bytes - Types: #13 bootstrap drain, repair opportunity edge cases, terminal state variants - Bootstrap claims: #46 first-seen recorded, #49 cleared on normal E2e tests (4 new): - #2 fresh offer with empty PoP rejected - #5/#37 neighbor sync request returns response - #11 audit challenge multi-key (present + absent) - Fetch not-found for non-existent key Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add unit and e2e tests covering the remaining Section 18 scenarios: Unit tests (32 new): - Quorum: #4 fail→abandoned, #16 timeout→inconclusive, #27 single-round dual-evidence, #28 dynamic threshold undersized, #33 batched per-key, #34 partial response unresolved, #42 quorum-derived paid-list auth - Admission: #5 unauthorized peer, #7 out-of-range rejected - Config: #18 invalid config rejected, #26 dynamic paid threshold - Scheduling: #8 dedup safety, #8 replica/paid collapse - Neighbor sync: #35 round-robin cooldown skip, #36 cycle completion, #38 snapshot stability mid-join, #39 unreachable removal + slot fill, #40 cooldown peer removed, #41 cycle termination guarantee, consecutive rounds, cycle preserves sync times - Pruning: #50 hysteresis prevents premature delete, #51 timestamp reset on heal, #52 paid/record timestamps independent, #23 entry removal - Audit: #19/#53 partial failure mixed responsibility, #54 all pass, #55 empty failure discard, #56 repair opportunity filter, response count validation, digest uses full record bytes - Types: #13 bootstrap drain, repair opportunity edge cases, terminal state variants - Bootstrap claims: #46 first-seen recorded, #49 cleared on normal E2e tests (4 new): - #2 fresh offer with empty PoP rejected - #5/#37 neighbor sync request returns response - #11 audit challenge multi-key (present + absent) - Fetch not-found for non-existent key Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace plaintext file chunking with convergent encryption via the self_encryption crate (quantum-safe: ChaCha20-Poly1305 + BLAKE3).