Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Auto-generated from all feature plans. Last updated: 2025-10-28

## Active Technologies
- Rust 1.75+ (edition 2024) + estcontainers 0.27 (new), tokio 1.x, tonic 0.14, existing mock-grpc-backend Docker image (002-testcontainers-integration)
- Rust 1.75+ (edition 2024) + okio 1.x, tonic 0.14.2, dashmap 6.1.0, clap 4.5, serde 1.0, toml 0.9.8, prometheus 0.14 (003-channel-cache-config)
- N/A (in-memory DashMap cache, no persistence) (003-channel-cache-config)

- Rust 1.75+ + okio (async runtime), tonic (gRPC client for backend checks), prometheus (metrics), serde (config/logging) (001-core-agent)

Expand All @@ -23,6 +25,7 @@ cargo test [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECH
Rust 1.75+: Follow standard conventions

## Recent Changes
- 003-channel-cache-config: Added Rust 1.75+ (edition 2024) + okio 1.x, tonic 0.14.2, dashmap 6.1.0, clap 4.5, serde 1.0, toml 0.9.8, prometheus 0.14
- 002-testcontainers-integration: Added Rust 1.75+ (edition 2024) + estcontainers 0.27 (new), tokio 1.x, tonic 0.14, existing mock-grpc-backend Docker image

- 001-core-agent: Added Rust 1.75+ + okio (async runtime), tonic (gRPC client for backend checks), prometheus (metrics), serde (config/logging)
Expand Down
35 changes: 35 additions & 0 deletions specs/003-channel-cache-config/checklists/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Specification Quality Checklist: Configurable Channel Cache

**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-02-23
**Feature**: [spec.md](../spec.md)

## Content Quality

- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed

## Requirement Completeness

- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified

## Feature Readiness

- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification

## Notes

- All items pass validation. Spec is ready for `/speckit.clarify` or `/speckit.plan`.
- The feature is straightforward (on/off toggle for existing behavior) with clear boundaries and no ambiguity.
43 changes: 43 additions & 0 deletions specs/003-channel-cache-config/data-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Data Model: Configurable Channel Cache

**Feature**: 003-channel-cache-config
**Date**: 2026-02-23

## Modified Entities

### AgentConfig

Existing configuration struct extended with one new field.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| server_port | u16 | 5555 | TCP port for Agent Text Protocol server |
| server_bind_address | String | "0.0.0.0" | Bind address for agent server |
| grpc_connect_timeout_ms | u64 | 1000 | gRPC connection timeout in milliseconds |
| grpc_rpc_timeout_ms | u64 | 1500 | gRPC RPC timeout in milliseconds |
| **grpc_channel_cache_enabled** | **bool** | **true** | **Whether to cache and reuse gRPC channels across health checks** |
| metrics_port | u16 | 9090 | HTTP port for Prometheus metrics |
| metrics_bind_address | String | "0.0.0.0" | Bind address for metrics server |
| log_level | LogLevel | Info | Logging verbosity level |
| log_format | LogFormat | Json | Logging output format |

**New field highlighted in bold.**

### Configuration Sources

| Source | Key/Flag | Example |
|--------|----------|---------|
| Environment variable | `HAPROXY_AGENT_GRPC_CHANNEL_CACHE` | `HAPROXY_AGENT_GRPC_CHANNEL_CACHE=false` |
| TOML config file | `grpc_channel_cache_enabled` | `grpc_channel_cache_enabled = false` |
| CLI argument | `--grpc-channel-cache` / `--no-grpc-channel-cache` | `--no-grpc-channel-cache` |

### Precedence

Environment variable < TOML config file < CLI argument (unchanged from existing behavior).

## Unchanged Entities

- **BackendChannelKey**: No changes (server, port, ssl_flag)
- **GrpcHealthChecker**: Internal struct unchanged; behavior modified by reading `config.grpc_channel_cache_enabled`
- **HealthCheckRequest / HealthCheckResponse**: No changes
- **Metrics**: No new metrics; `GRPC_CHANNELS_ACTIVE` reports 0 when caching disabled
99 changes: 99 additions & 0 deletions specs/003-channel-cache-config/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Implementation Plan: Configurable Channel Cache

**Branch**: `003-channel-cache-config` | **Date**: 2026-02-23 | **Spec**: [spec.md](spec.md)
**Input**: Feature specification from `/specs/003-channel-cache-config/spec.md`

## Summary

Add a boolean configuration option `grpc_channel_cache_enabled` (default: `true`) to `AgentConfig` that controls whether `GrpcHealthChecker` caches and reuses gRPC channels across health checks. When disabled, each health check creates a fresh channel and the `GRPC_CHANNELS_ACTIVE` metric remains at 0. The option follows existing configuration patterns: environment variable, TOML file, and CLI argument with standard precedence.

## Technical Context

**Language/Version**: Rust 1.75+ (edition 2024)
**Primary Dependencies**: tokio 1.x, tonic 0.14.2, dashmap 6.1.0, clap 4.5, serde 1.0, toml 0.9.8, prometheus 0.14
**Storage**: N/A (in-memory DashMap cache, no persistence)
**Testing**: cargo test + testcontainers 0.27 (integration tests with mock gRPC backend)
**Target Platform**: Linux server / Docker container
**Project Type**: Single Rust binary
**Performance Goals**: No degradation when caching enabled (existing behavior). Acceptable increased latency per check when caching disabled (operator's explicit choice).
**Constraints**: Total gRPC timeout < 2000ms (HAProxy default). No new dependencies required.
**Scale/Scope**: Single config field addition, 3 files modified (config.rs, checker.rs, tests/common/mod.rs), 1 new test file.

## Constitution Check

*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*

| Principle | Status | Notes |
|-----------|--------|-------|
| I. Agent Pattern | PASS | Single config option, no new external dependencies, follows existing config patterns (env/file/CLI) |
| II. Integration-Heavy Testing | PASS | New integration test will verify cache-disabled behavior with real mock backend via testcontainers |
| III. Observability | PASS | `GRPC_CHANNELS_ACTIVE` metric correctly reports 0 when caching disabled; startup log includes new config value |
| IV. HAProxy Protocol Compliance | PASS | No protocol changes; agent still returns up/down via Agent Check Protocol |
| V. Simplicity & Reliability | PASS | Minimal change: one boolean field, conditional logic in one method. No abstractions, no new patterns. YAGNI-compliant. |

No violations. No entries needed in Complexity Tracking table.

## Project Structure

### Documentation (this feature)

```text
specs/003-channel-cache-config/
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
└── tasks.md # Phase 2 output (created by /speckit.tasks)
```

### Source Code (repository root)

```text
src/
├── checker.rs # MODIFY: conditional cache logic in get_or_create_channel
├── config.rs # MODIFY: add grpc_channel_cache_enabled field + env/CLI/validation
├── lib.rs # NO CHANGE
├── logger.rs # NO CHANGE
├── main.rs # MODIFY: log new config value at startup
├── metrics.rs # NO CHANGE
├── protocol.rs # NO CHANGE
└── server.rs # NO CHANGE

tests/
├── common/mod.rs # MODIFY: add start_agent_with_config helper accepting AgentConfig
├── integration_test.rs # NO CHANGE
├── resilience_test.rs # NO CHANGE
└── cache_config_test.rs # NEW: tests for cache-enabled/disabled behavior
```

**Structure Decision**: Existing single-project layout preserved. Changes touch 3 source files (config.rs, checker.rs, main.rs), 1 test utility (common/mod.rs), and add 1 test file (cache_config_test.rs).

## Complexity Tracking

No violations to justify.

## Design Decisions

### D1: Config field name

**Decision**: `grpc_channel_cache_enabled` (bool, default `true`)
**Rationale**: Follows existing naming convention (`grpc_connect_timeout_ms`, `grpc_rpc_timeout_ms`). Positive boolean (`enabled`) avoids double-negative confusion (`no_cache = false`).
**Env var**: `HAPROXY_AGENT_GRPC_CHANNEL_CACHE` (values: `true`/`false`)
**CLI flag**: `--grpc-channel-cache` / `--no-grpc-channel-cache` (clap boolean flag)
**TOML key**: `grpc_channel_cache_enabled = true`

### D2: Implementation approach in checker.rs

**Decision**: Conditional bypass in `get_or_create_channel` — when caching is disabled, skip cache lookup/insert and always create a fresh channel.
**Rationale**: Minimal code change. The channel creation logic is identical regardless of caching. Only the DashMap interactions are skipped.
**Alternative rejected**: Separate `CachedChecker` / `UncachedChecker` types — over-engineered for a boolean toggle.

### D3: DashMap instantiation when caching disabled

**Decision**: Still create the DashMap but never use it. The `Arc<DashMap>` is a zero-cost allocation when empty.
**Rationale**: Avoids `Option<Arc<DashMap>>` complexity. An empty DashMap has negligible memory footprint.

### D4: Test strategy

**Decision**: New `tests/cache_config_test.rs` with integration tests using testcontainers. Add `start_agent_with_config` to `tests/common/mod.rs` to allow passing custom `AgentConfig`.
**Rationale**: Constitution requires integration tests. Existing tests use default config and must remain unchanged. New helper enables config customization without breaking existing test patterns.
66 changes: 66 additions & 0 deletions specs/003-channel-cache-config/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Quickstart: Configurable Channel Cache

**Feature**: 003-channel-cache-config

## What Changed

A new configuration option `grpc_channel_cache_enabled` controls whether the agent caches gRPC channels between health checks. By default, caching is enabled (existing behavior). Setting it to `false` forces the agent to create a fresh gRPC connection for every health check.

## How to Use

### Disable channel caching via environment variable

```bash
HAPROXY_AGENT_GRPC_CHANNEL_CACHE=false ./haproxy-grpc-agent
```

### Disable channel caching via TOML config file

```toml
# agent.toml
grpc_channel_cache_enabled = false
```

```bash
./haproxy-grpc-agent -c agent.toml
```

### Disable channel caching via CLI argument

```bash
./haproxy-grpc-agent --no-grpc-channel-cache
```

### Keep default behavior (caching enabled)

No change needed. The default is `true`. All existing configurations continue to work without modification.

## Files Modified

| File | Change |
|------|--------|
| `src/config.rs` | New `grpc_channel_cache_enabled` field in `AgentConfig`, env var parsing, CLI arg, default function |
| `src/checker.rs` | Conditional cache bypass in `get_or_create_channel` |
| `src/main.rs` | Log new config value at startup |
| `tests/common/mod.rs` | New `start_agent_with_config` helper |
| `tests/cache_config_test.rs` | New integration tests for cache-disabled behavior |

## Verification

Run all tests:

```bash
cargo test
```

Run only the new cache config tests:

```bash
cargo test --test cache_config_test
```

Verify with clippy:

```bash
cargo clippy
```
44 changes: 44 additions & 0 deletions specs/003-channel-cache-config/research.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Research: Configurable Channel Cache

**Feature**: 003-channel-cache-config
**Date**: 2026-02-23

## R1: Boolean config field patterns in clap + serde

**Decision**: Use `#[serde(default = "default_true")]` for TOML/serde and `#[arg(long, default_value_t = true)]` with `--no-grpc-channel-cache` negation for clap CLI.

**Rationale**: clap 4.x supports boolean flags with `--flag` / `--no-flag` syntax via `#[arg(long, default_value_t = true, action = clap::ArgAction::Set)]`. For the environment variable, parse `"true"` / `"false"` strings matching the existing pattern used for other env vars in `load_from_env`.

**Alternatives considered**:
- `clap::ArgAction::SetTrue` / `SetFalse` — doesn't support default `true` cleanly
- Separate `--disable-channel-cache` flag — inconsistent with positive-bool convention

## R2: DashMap behavior when unused

**Decision**: Instantiate `DashMap::new()` even when caching is disabled; never insert into it.

**Rationale**: An empty `DashMap` allocates minimal memory (just the internal shard array headers). Using `Option<DashMap>` would add `.as_ref().unwrap()` noise throughout `get_or_create_channel` for no meaningful benefit. The empty map is effectively free.

**Alternatives considered**:
- `Option<Arc<DashMap>>` — adds unwrap complexity, `None` path, conditional metric reporting
- Trait-based strategy pattern — over-engineered for a boolean toggle

## R3: Channel lifecycle when caching is disabled

**Decision**: Create a fresh `Channel` for each `check_backend_internal` call. The channel is used for the single RPC and then dropped (Rust ownership ensures cleanup).

**Rationale**: tonic `Channel` implements `Drop` which closes the underlying HTTP/2 connection. When the channel goes out of scope after the health check completes, the connection is cleaned up. No explicit teardown needed.

**Alternatives considered**:
- Channel pool with size=0 — unnecessary abstraction
- Explicit `channel.close()` — not available in tonic's API; Drop handles it

## R4: Test approach for verifying cache bypass

**Decision**: Use two consecutive health checks to the same backend with caching disabled. Verify both succeed (functional correctness). Verify `GRPC_CHANNELS_ACTIVE` metric stays at 0 (behavioral verification that caching is bypassed).

**Rationale**: Directly inspecting the DashMap internals would require exposing implementation details. The metric-based approach tests the observable behavior without coupling tests to internal data structures.

**Alternatives considered**:
- Expose `channel_cache.len()` as public API — leaks implementation details
- Mock the DashMap — impractical with concrete type, and constitution favors integration tests
Loading