Conversation
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
|
Cursor Agent can help with this pull request. Just |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 93% | Review time: 464.3s
🟡 3 warnings, 💡 4 suggestions, 📝 3 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-9406375e
| fn prune_expired_challenges(store: &mut HashMap<String, PendingChallenge>, now: u64) { | ||
| store.retain(|_, challenge| challenge.expires_at > now); | ||
| } | ||
|
|
There was a problem hiding this comment.
💡 Challenge consumed before complete request validation
The challenge is atomically removed in consume_challenge before signature verification occurs; if TLS is compromised, an attacker who intercepts a challenge_id could race to consume it with an invalid signature, invalidating the legitimate user's challenge.
Suggested fix:
Consider validating the signature before consuming the challenge, or document that TLS is a security requirement. Alternatively, allow legitimate users to re-request challenges seamlessly.
| } | ||
|
|
||
| fn build_signature_payload( | ||
| challenge_id: &str, |
There was a problem hiding this comment.
🟡 Signature payload determinism relies on JSON key ordering
Both merod and mero-kms-phala independently serialize serde_json::json! to bytes for signing; if field ordering differs across versions or features, signature verification will silently fail.
Suggested fix:
Define a dedicated struct with `#[derive(Serialize)]` and explicit field order, or use a canonical serialization format.
| /// Create the router with all endpoints. | ||
| pub fn create_router(config: Config) -> Router { | ||
| let state = AppState { config }; | ||
| let state = AppState { |
There was a problem hiding this comment.
💡 Challenge store has no size limit
The pending challenges HashMap has no maximum size. An attacker flooding the /challenge endpoint could cause memory growth, though pruning on each access provides partial mitigation.
Suggested fix:
Consider adding a max_pending_challenges limit and rejecting new challenges when the limit is reached, or implement per-peer-id rate limiting.
| "challengeNonceHex": hex::encode(challenge_nonce), | ||
| "quoteHashHex": hex::encode(quote_hash), | ||
| "peerId": peer_id, | ||
| })) |
There was a problem hiding this comment.
🟡 DRY violation: build_signature_payload duplicated in handlers.rs and kms.rs
The build_signature_payload function is implemented identically in both the KMS server (handlers.rs) and client (kms.rs), creating a maintenance risk if the payload format ever changes in one place but not the other.
Suggested fix:
Extract `build_signature_payload` to a shared crate (e.g., a new `calimero-kms-protocol` or existing `calimero-server-primitives`) so both sides use the same implementation.
| ) -> Result<[u8; 32], String> { | ||
| let now = unix_now_secs().map_err(|e| format!("{:?}", e))?; | ||
| let mut guard = state | ||
| .challenges |
There was a problem hiding this comment.
🟡 Challenge consumed before full validation completes
The challenge is removed from the store (line 420) before peer_id validation (line 423), so if peer_id mismatches, the challenge is already consumed and the legitimate client cannot retry.
Suggested fix:
Validate peer_id before calling `guard.remove(challenge_id)`, or re-insert the challenge on validation failure.
| } | ||
|
|
||
| let normalized_actual = normalize_measurement(actual_measurement); | ||
| if allowed_measurements |
There was a problem hiding this comment.
💡 Empty RTMR allowlists permit any runtime measurement
When RTMR0-3 allowlists are empty (the default), any value is accepted. While MRTD is required, this could allow attestations from TEEs with unexpected runtime measurements if operators don't understand they should configure all relevant measurements.
Suggested fix:
Consider logging a warning at startup when measurement policy is enforced but RTMR allowlists are empty, or add documentation emphasizing this behavior.
| })) | ||
| .map_err(|e| ServiceError::InvalidSignature(format!("failed to serialize payload: {}", e))) | ||
| } | ||
|
|
There was a problem hiding this comment.
📝 Nit: Signature payload test only verifies single-function determinism
The test test_signature_payload_is_deterministic only verifies that the same function produces consistent output, but doesn't verify that handlers.rs and kms.rs implementations produce identical output for the same inputs.
Suggested fix:
Consider adding a shared constant or integration test that verifies both client and server produce identical signature payloads.
| } | ||
|
|
||
| fn unix_now_secs() -> Result<u64, ServiceError> { | ||
| SystemTime::now() |
There was a problem hiding this comment.
📝 Nit: Semantic mismatch in error mapping for clock errors
The unix_now_secs() function maps system clock errors to ServiceError::InvalidChallenge, which is misleading since the root cause is a clock issue, not an invalid challenge.
Suggested fix:
Consider adding a dedicated `ServiceError::InternalError(String)` variant for unexpected system errors like clock failures.
| @@ -18,16 +27,50 @@ use crate::Config; | |||
| #[derive(Clone)] | |||
| pub struct AppState { | |||
There was a problem hiding this comment.
💡 Consider using tokio::sync::Mutex for async-compatible locking
Using std::sync::Mutex in async handlers can block the executor thread while waiting for the lock; while the critical section here is short, tokio::sync::Mutex is the idiomatic choice in async contexts.
Suggested fix:
Replace `Arc<Mutex<HashMap<...>>>` with `Arc<tokio::sync::Mutex<HashMap<...>>>` and use `.lock().await`.
| eyre::eyre!( | ||
| "{} value '{}' from {} is not valid hex: {}", | ||
| label, | ||
| value, |
There was a problem hiding this comment.
📝 Nit: normalize_hex duplicates normalize_measurement logic
Both normalize_hex in main.rs and normalize_measurement in handlers.rs perform the same transformation (trim, strip 0x, lowercase).
Suggested fix:
Extract to a single utility function in one module and reuse it.
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
|
Your PR title does not adhere to the Conventional Commits convention: Common errors to avoid:
|
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 363.9s
🟡 4 warnings, 💡 5 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-2249ed17
| fn prune_expired_challenges(store: &mut HashMap<String, PendingChallenge>, now: u64) { | ||
| store.retain(|_, challenge| challenge.expires_at > now); | ||
| } | ||
|
|
There was a problem hiding this comment.
🟡 Challenge removed before peer_id validation allows denial-of-service
The challenge is removed from the store before checking if peer_id matches; if an attacker knows a challenge_id, they can invalidate it by calling with a wrong peer_id, preventing the legitimate owner from using it.
Suggested fix:
Use `.get()` to validate peer_id first, then `.remove()` only after validation passes. While challenge_id is 128-bit random making this low practical risk, the correct pattern is validate-then-consume.
| State(state): State<AppState>, | ||
| Json(request): Json<ChallengeRequest>, | ||
| ) -> Result<Json<ChallengeResponse>, ServiceError> { | ||
| let mut nonce = [0u8; 32]; |
There was a problem hiding this comment.
🟡 Challenge store unbounded growth enables memory exhaustion DoS
The challenge HashMap grows without bounds except for passive expiry-based pruning; an attacker can flood /challenge to exhaust server memory.
Suggested fix:
Add a maximum pending challenges limit per peer_id and/or global, rejecting new challenges when limit is reached. Consider adding rate limiting middleware.
| @@ -265,4 +584,101 @@ mod tests { | |||
| let json = serde_json::to_string(&error_no_details).unwrap(); | |||
There was a problem hiding this comment.
🟡 Missing tests for challenge-response flow
PR description claims tests for challenge expiry, single-use consumption, and peer ID mismatch, but these are not present in the diff; the challenge flow is security-critical.
Suggested fix:
Add unit tests for: expired challenge rejection, already-consumed challenge rejection, and peer_id mismatch between challenge and get-key request.
| ) -> Result<Vec<u8>, ServiceError> { | ||
| let quote_hash = Sha256::digest(quote_bytes); | ||
| serde_json::to_vec(&serde_json::json!({ | ||
| "challengeId": challenge_id, |
There was a problem hiding this comment.
🟡 DRY violation: build_signature_payload duplicated across crates
This function is duplicated in crates/merod/src/kms.rs; if the signature protocol changes, both implementations must stay in sync or key requests will fail.
Suggested fix:
Extract to a shared crate (e.g., calimero-server-primitives or a new calimero-kms-protocol crate) and import from both locations.
| hasher.update(peer_id.as_bytes()); | ||
| hasher.finalize().into() | ||
| } | ||
|
|
There was a problem hiding this comment.
💡 DRY violation: hash_peer_id duplicated in merod/src/kms.rs
The same hashing function exists in both crates; a mismatch would cause peer ID verification to fail silently.
Suggested fix:
Move to a shared module so both KMS client and server use the same implementation.
| /// Create the router with all endpoints. | ||
| pub fn create_router(config: Config) -> Router { | ||
| let state = AppState { config }; | ||
| let state = AppState { |
There was a problem hiding this comment.
💡 Challenge store is ephemeral and single-node only
The in-memory HashMap challenge store will lose state on restart and cannot be shared across replicas; this limitation should be documented for operators.
Suggested fix:
Add a note in README.md under Production guidance that the service must run as a singleton or use sticky sessions.
| if !policy.enforce_measurement_policy { | ||
| return Ok(()); | ||
| } | ||
|
|
There was a problem hiding this comment.
💡 Signature payload JSON ordering relies on serde_json implementation detail
Both client and server construct identical JSON objects for signature, but JSON key ordering depends on serde_json's Map implementation; if one side changes or uses different features, signature verification will fail silently.
Suggested fix:
Consider using a canonical serialization format (e.g., sorted keys explicitly, or concatenating raw bytes in defined order) to make the signature scheme more robust.
| fn unix_now_secs() -> Result<u64, ServiceError> { | ||
| SystemTime::now() | ||
| .duration_since(UNIX_EPOCH) | ||
| .map(|d| d.as_secs()) |
There was a problem hiding this comment.
💡 O(n) challenge pruning on every request
The prune_expired_challenges iterates all pending challenges on both /challenge and /get-key requests; under high load with many pending challenges this could cause lock contention.
Suggested fix:
For expected TEE attestation rates this is likely acceptable. If needed, consider a time-based throttle (prune at most once per second) or a background cleanup task.
| .lock() | ||
| .map_err(|_| "challenge store lock poisoned".to_owned())?; | ||
| prune_expired_challenges(&mut guard, now); | ||
|
|
There was a problem hiding this comment.
💡 Redundant expiry check after prune with same timestamp
The prune_expired_challenges already removes challenges where expires_at <= now, so a challenge that survived cannot fail the subsequent expires_at <= now check using the same now value.
Suggested fix:
This check is defensive but unreachable given the prune logic uses the same timestamp. Consider removing or documenting as defense-in-depth.
feat(kms): Implement proper TEE attestation and key release policy
Description
This PR addresses the current mock/weak TEE attestation in
mero-kms-phalaby implementing a robust key release policy. The primary goal is to ensure that keys are released only to genuinely attested and authorized TEEs, preventing impersonation and replay attacks.The changes introduce:
mero-kms-phalanow enforces configurable policies for TCB status, MRTD, and RTMR0-3 measurements, rejecting key requests from non-compliant TEEs./challengeendpoint and an updated/get-keyflow inmero-kms-phalaandmerodensure freshness and bind the key request to the requesting node's identity key via a signature. This prevents quote replay and impersonation.tee-attestationcrate now exposes TCB status and advisory IDs, enabling granular policy decisions in the KMS.merodClient:merodnow follows the new challenge-response protocol, embedding the server-provided nonce and signing the request payload.This ensures key release is gated on:
Test plan
The following tests were run to verify the changes:
mero-kms-phala: Policy matcher (MRTD/RTMR/TCB allow/deny), challenge expiry, single-use challenge, and peer ID signature binding.merod: Deterministic signature payload generation and updated report data construction.To reproduce, run:
(Note:
merodtests may require specific C++ linker configuration depending on the environment, as shown above.)Documentation update
The following documentation needs to be updated:
mero-kms-phalaconfiguration: Document new environment variables for policy (ENFORCE_MEASUREMENT_POLICY,ALLOWED_TCB_STATUSES,ALLOWED_MRTD,ALLOWED_RTMR*,CHALLENGE_TTL_SECS).merodclient behavior: Update documentation for howmerodrequests keys from KMS, detailing the new challenge-response and signature process.mero-kms-phalawith trusted MRTD/RTMR values for production environments.