This document describes the current implemented v1 security posture.
It covers the security properties the repository provides today, the trust boundaries those properties apply to, and the explicit gaps that remain out of scope.
It is evidence-backed and limited to behavior implemented in src/bin/kvd.cpp, src/service/kv_raft_service.cpp, src/engine/wal.cpp, src/engine/sstable.cpp, the v1 ADR set, and Task 7 validation artifacts.
The current implementation protects a narrow set of boundaries.
- Client to server transport boundary: one client-facing gRPC listener per
kvdprocess, implemented insrc/bin/kvd.cpp - Service validation boundary: request sizes and request identity checks enforced in
src/service/kv_raft_service.cppbefore a mutation is proposed into Raft - Storage integrity boundary: persisted WAL and SST bytes are verified before replay or read in
src/engine/wal.cppandsrc/engine/sstable.cpp
The current implementation also has important trust assumptions.
- The five logical Raft nodes are embedded inside one process through
kvstore::raft::TestClusterandTestTransport, not through a separate networked peer transport,src/service/kv_raft_service.cpp,src/raft/test_transport.cpp - Local process memory, the host filesystem, and the PEM files passed to
kvdare trusted once the process starts - This repository does not implement tenant isolation, user authentication, authorization, or automated secret handling
For v1, the main security goals are:
- protect client-facing gRPC traffic when
securemode is selected - reject malformed or out-of-contract client requests before they enter the replicated command path
- detect persisted data corruption deterministically instead of silently accepting corrupted bytes
The runtime listener supports exactly two profiles, dev and secure, in src/bin/kvd.cpp.
| Profile | Current behavior | Security meaning | Limits |
|---|---|---|---|
dev |
kvd uses grpc::InsecureServerCredentials() |
Plaintext gRPC over TCP for local development and test flows | No transport encryption, no peer authentication beyond normal network reachability |
secure |
kvd uses grpc::SslServerCredentials(...) and requires --tls_cert=PATH plus --tls_key=PATH |
Encrypts the client-facing gRPC listener with PEM certificate and private key material | No mTLS, no automated certificate rotation, no cluster-wide or inter-node TLS |
Task 7 runtime evidence confirms that both profiles start successfully and preserve the same Put and Get semantics through an external client smoke path, tests/grpc/tls_profile_toggle_test.cpp and .sisyphus/evidence/task7-final/chaos_partition_heal_and_tls.json.
The secure-profile boundary is intentionally narrow.
- It covers the client-facing gRPC listener only
- It uses PEM cert and key files supplied at process start
- It does not imply full end-to-end cluster encryption
- It does not add mTLS, user authn, authz, or secret lifecycle automation
This narrow scope is consistent with the implemented topology in docs/architecture.md: peer replication is in-process in the current tree, so there is no separate inter-node TLS layer to enable today.
ADR 0003 establishes the v1 integrity contract: checksum verification is mandatory for WAL records and SST blocks on read and replay paths, docs/adr/0003-integrity-checksum-strategy-v1.md.
The implementation enforces that contract in two places.
src/engine/wal.cpp verifies each replayed record before the mutation is accepted.
Replay rejects at least these cases:
- invalid WAL magic
- unsupported WAL version
- unknown operation tag
- record sizes that exceed the v1 contract limits
- truncated headers or payloads
- checksum mismatch
If checksum verification fails, replay returns an explicit integrity error instead of applying the record.
src/engine/sstable.cpp verifies SST structure before data is trusted.
The reader checks:
- SST header magic and version
- footer magic and version
- footer checksum
- index-frame structure and checksum
- block-frame structure and checksum
- monotonic index ordering and payload bounds
If any of those checks fail, the read path surfaces an integrity error and does not return corrupted data as valid state.
Task 7 preserved corruption-injection evidence in .sisyphus/evidence/task7-checks/integrity/integrity_corruption_suite.json, produced by scripts/integrity/run_corruption_suite.py.
That artifact records pass=true and shows both of the repository's corruption gates failing closed:
- WAL replay returns
"integrity_code":"CHECKSUM_MISMATCH" - SST read returns
"integrity_code":"CHECKSUM_MISMATCH"
This is the current proof that v1 detects on-disk corruption in the implemented durability path rather than silently accepting corrupted persisted bytes.
The v1 input contract is enforced in the service layer, not by protobuf alone, proto/kvstore/v1/kv.proto, src/service/kv_raft_service.cpp, and docs/architecture/scope.md.
Current limits and checks are:
- key size:
<= 1024bytes forPut,Get, andDelete - value size:
<= 1,048,576bytes forPut Put.request_id: must be non-empty and<= 4096bytes- conflicting reuse of a
Put.request_idwith different key or value bytes is rejected
When those checks fail, the gRPC adapter maps the service error to INVALID_ARGUMENT, src/api/grpc_kv_service.cpp.
Validation boundaries worth calling out explicitly:
Putvalidates key, value, andrequest_idbefore proposalDeletevalidates key and requires a client-suppliedrequest_idbefore proposal, matching the write-idempotency boundary in the service layerGetvalidates key size and rejects linearizable reads when the leader has no quorum contact, which prevents stale reads from being presented as current state- WAL replay also rejects record sizes above the same v1 limits before reconstructing persisted commands,
src/engine/wal.cpp
These checks create a clear boundary between client input and replicated state: malformed, oversized, or conflicting requests are rejected before they are treated as valid commands.
For the current repository, secure operation means enabling the client-facing TLS listener and managing PEM files carefully by hand.
Use these rules:
- Start
kvdwith--tls_profile=secure --tls_cert=PATH --tls_key=PATH, as required bysrc/bin/kvd.cpp. - Distribute the corresponding certificate material to clients that need to validate the server. The Task 7 smoke client does this through
--tls_cert=PATHintests/grpc/tls_profile_toggle_test.cpp. - Treat the certificate and private key files as operator-managed secrets. The repository does not rotate them, store them in a secret manager, or provision them automatically.
- Use
securefor any environment where plaintext client traffic is not acceptable. Keepdevfor local development and controlled test scenarios only. - Do not describe
securemode as cluster encryption. It protects the client-facing listener only.
Operationally, secure mode improves confidentiality and integrity for client-facing RPC transport. It does not change the trust assumptions around embedded Raft peers, local disk contents, or process-local secret handling.
This document is based on the following implemented sources.
docs/adr/0003-integrity-checksum-strategy-v1.mdsrc/bin/kvd.cppsrc/service/kv_raft_service.cppsrc/api/grpc_kv_service.cppsrc/engine/wal.cppsrc/engine/sstable.cppproto/kvstore/v1/kv.protodocs/architecture.mddocs/wire-protocol.mddocs/testing.mddocs/architecture/scope.md
tests/grpc/tls_profile_toggle_test.cppscripts/integrity/run_corruption_suite.py.sisyphus/evidence/task7-final/chaos_partition_heal_and_tls.json.sisyphus/evidence/task7-checks/integrity/integrity_corruption_suite.json
The key evidence points are:
- Task 7 runtime TLS smoke coverage recorded
pass=truefor bothdevandsecurelistener profiles in.sisyphus/evidence/task7-final/chaos_partition_heal_and_tls.json tests/grpc/tls_profile_toggle_test.cppverifies semantic parity across the two transport profiles and reports"semantic_drift":falsein the in-process coverage pathscripts/integrity/run_corruption_suite.pyand.sisyphus/evidence/task7-checks/integrity/integrity_corruption_suite.jsonshow WAL and SST corruption surfacing asCHECKSUM_MISMATCH
The current repository does not implement the following security features, and this document does not imply otherwise.
- inter-node TLS or a separate TLS-protected Raft peer transport
- mutual TLS
- authentication or authorization
- multi-tenant isolation
- secret rotation
- key management automation
- encryption at rest
- production PKI lifecycle management
Secure mode in v1 should therefore be read precisely:
- yes, it encrypts the client-facing gRPC transport using PEM cert and key files
- yes, it has runtime evidence in Task 7
- no, it does not provide full end-to-end cluster encryption
- no, it does not provide identity, access-control, or tenancy isolation guarantees
- no, it does not replace future hardening work for deployment, secret storage, or certificate operations