Skip to content

feat: Gordon v3 - Sub-Container Architecture with gRPC#45

Open
bnema wants to merge 18 commits intomainfrom
v3-dev
Open

feat: Gordon v3 - Sub-Container Architecture with gRPC#45
bnema wants to merge 18 commits intomainfrom
v3-dev

Conversation

@bnema
Copy link
Owner

@bnema bnema commented Feb 1, 2026

Summary

Decomposes Gordon from a single monolithic process into 4 isolated containers (core, proxy, registry, secrets) orchestrated by the core component. All inter-container communication uses gRPC on an internal Docker network.

Key Changes

  • 4 Component Architecture: Single Docker image with --component flag (core|proxy|registry|secrets)
  • gRPC Communication: Protobuf-based services for all inter-component calls
  • Docker Orchestration: Core component auto-deploys and manages 3 sub-containers
  • Integration Tests: Full test suite validating startup, gRPC, and deployment flows

Components

  • gordon-core: Orchestrator on ports 5000 (admin API) + 9090 (gRPC)
  • gordon-proxy: HTTP reverse proxy on port 80
  • gordon-registry: Image registry on ports 5000 (HTTP) + 9092 (gRPC)
  • gordon-secrets: Password-store backend on port 9091 (gRPC)

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Docker Network                         │
├─────────────────────────────────────────────────────────┤
│  gordon-core orchestrates 3 sub-containers via gRPC      │
│  All communication over internal network (no fallback)   │
└─────────────────────────────────────────────────────────┘

Testing

  • Added integration test suite in tests/integration/
  • New Makefile targets: test-integration, test-integration-quick
  • Helper script: scripts/test-v3.sh
  • See docs/v3-testing.md for testing guide

Dependencies

  • Added google.golang.org/grpc v1.78.0
  • Added google.golang.org/protobuf v1.36.11
  • Added testcontainers-go v0.40.0 for integration tests
  • Added Buf for protobuf code generation

Breaking Changes

  • Admin API now uses gRPC instead of HTTP/JSON
  • CLI remote client updated to use gRPC
  • No monolith mode - requires 4 containers

Migration

See docs/v3-testing.md for local testing and migration guide.

Summary by CodeRabbit

  • New Features

    • Multi-component architecture with core, proxy, registry, and secrets services.
    • gRPC-based inter-component communication replacing HTTP APIs.
    • Automatic sub-container lifecycle management with health monitoring and restart capabilities.
    • Environment-driven configuration system.
    • Integration test suite validating component deployment and communication.
  • Removed Features

    • HTTP admin API endpoint (migrated to gRPC).
  • Documentation

    • Added comprehensive v3 testing guide and example configuration files.

✏️ Tip: You can customize this high-level summary in your review settings.

bnema added 18 commits January 30, 2026 15:32
- Add core, proxy, registry, secrets components
- Implement gRPC servers and clients for inter-container communication
- Add lifecycle manager for sub-container orchestration
- Update Dockerfile and Makefile for v3 builds
- Add --component flag to serve command
- Implement TargetResolver and RouteWatcher boundaries
Update testcontainers-go to v0.40.0 for integration tests.
Includes docker/docker v27.3.1 compatibility fix.
Add test-integration, test-integration-build, and test-integration-quick
Makefile targets for local Docker-based integration testing.
Add full service initialization for gordon-core:
- Config, container, health, secrets, log services
- Registry service with gRPC integration
- Auth service with token management
- Admin API HTTP server on :5000
- gRPC CoreService on :9090
- Lifecycle manager for sub-container orchestration
- Proper event handlers for image push deploy
- Container sync and auto-start on startup
Ensure lifecycle manager passes --component flag when creating
sub-containers (secrets, registry, proxy). Fixes issue where
sub-containers defaulted to monolithic mode requiring Docker.
Add comprehensive integration tests using Testcontainers-Go v0.40.0:
- Test suite with real scenario: core deploys sub-containers
- Four-container startup verification
- gRPC communication tests
- Docker rootless support
- Sequential test execution for reliability
- 10-minute max duration budget
- Helper utilities for container management
Document integration testing approach for Gordon v3:
- Testcontainers-Go setup and usage
- 4-container architecture validation
- Docker rootless configuration
- Test scenarios and time budgets
Add test-v3.sh script for manual integration testing:
- Builds test images
- Starts containers in sequence
- Verifies health checks
- Logs container status
- Add grpcregistry client for core to connect to registry sub-container
- Add HTTP health endpoint to proxy component at /health
- Enhance integration tests with proper gRPC health checks using grpc_health_v1
- Create Test02_gRPCCommunication for comprehensive inter-component testing
- Fix test setup: put gordon-core on gordon-internal network so proxy can reach it
- Verify all 4 containers start and communicate correctly via gRPC
Add comprehensive integration tests for Gordon v3 architecture:

- Test03: Image push notification and routing flow
  - Tests NotifyImagePushed gRPC endpoint
  - Verifies Core → Registry communication
  - Validates Core routing capability

- Test04: Auto-restart of failed sub-containers
  - Kills secrets, registry, and proxy containers
  - Verifies lifecycle manager restarts them
  - Confirms gRPC connectivity restored

- Test05: Security verification
  - Docker socket access control (only core)
  - Network isolation on gordon-internal
  - Proxy privilege verification

- suite_test.go: Add refreshAllContainerRefs helper
  - Handles container reference updates after restarts
This commit addresses all 73 review comments from the CoderabbitAI analysis
of the v3 gRPC-only admin and sub-container architecture.

Build & Lint Fixes:
- Fix test-v3.sh build path to run from project root
- Move proto files to api/proto/gordon/ for Buf PACKAGE_DIRECTORY_MATCH compliance
- Add unique response wrapper types for all gRPC methods (GetRouteResponse, etc.)
- Fix enum naming: CHANGE_TYPE_INVALIDATE, CHANGE_TYPE_INVALIDATE_ALL
- Update buf.gen.yaml: paths=import for proper Go package structure
- Regenerate all gRPC code with new proto structure

Client/Server Runtime Fixes:
- Remove sync.Once dial pattern in CLI remote client (allows retries)
- Add timeout to Authenticate method
- Fix response unwrapping in all gRPC clients
- Update server call sites to return wrapped response types

Container Configuration:
- Add PortBindings, Privileged, ReadOnlyRoot to domain.ContainerConfig
- Implement explicit port mappings in Docker runtime (no random ports)
- Apply volumes, privileged mode, and read-only rootfs to containers
- Fix Stop() method to be idempotent using sync.Once
- Fix waitForHealth() to respect context cancellation

Race Condition Fixes:
- Add sync.Once to WatchRouteChanges channel cleanup (prevents send to closed channel)
- Close watcher channels only after removal from map

Error Mapping Improvements:
- Registry server: proper NotFound vs InvalidArgument vs Internal errors
- Secrets server: distinguish NotFound, InvalidArgument, and Internal errors
- Add input validation for empty parameters

Healthcheck & Shutdown:
- Add grpc-health-probe to Dockerfile
- Implement component-aware healthcheck (gRPC for secrets, HTTP for others)
- Add 10s shutdown timeout to registry HTTP server
- Proxy and Core already had proper shutdown timeouts

Security & Auth:
- Add Bearer token prefix validation in auth interceptor
- Update gordon.toml.example with strong password warnings
- Fix weak credentials in example config

Documentation:
- Update docs/v3-testing.md: replace REST examples with gRPC equivalents
- Add grpcurl examples for authentication and route management
- Update all config files to match v3 schema

Config Updates:
- Rewrite gordon.toml.example for v3 architecture
- Update basic.toml test fixture to v3 structure
- Remove deprecated keys (registry_auth, env.providers, runtime)
- Add new v3 sections: [api.rate_limit], [external_routes], [network_groups],
  [attachments], [auto_route], [network_isolation]

All changes verified:
- Build successful
- Linter clean (0 issues)
- All tests passing
Copilot AI review requested due to automatic review settings February 1, 2026 06:44
@coderabbitai
Copy link

coderabbitai bot commented Feb 1, 2026

📝 Walkthrough

Walkthrough

Migrates Gordon from v2 (monolithic) to v3 (component-based architecture) by introducing gRPC inter-component communication, splitting the single Run function into component-specific entry points (Core, Proxy, Registry, Secrets), removing HTTP admin API in favor of gRPC, adding comprehensive lifecycle management and health checks for sub-containers, and introducing extensive integration tests. Includes Dockerfile updates, dependency additions (testcontainers-go, gRPC), protocol buffer configuration, and new CLI remote client backed by gRPC.

Changes

Cohort / File(s) Summary
Dockerfile and Container Configuration
Dockerfile, Makefile, buf.yaml, buf.gen.yaml, gordon.toml.example, scripts/test-v3.sh
Updated Docker image for v3 multi-component architecture with new runtime dependencies, component-aware health checks, and gRPC probes. Makefile bumped versions to v3, made ENGINE configurable, and added integration test targets and proto generation. Protocol buffer configuration added for gRPC code generation. Example config file and test script provided for v3 deployment and testing.
gRPC Server Infrastructure
internal/adapters/in/grpc/admin/server.go, internal/adapters/in/grpc/core/server.go, internal/adapters/in/grpc/registry/server.go, internal/adapters/in/grpc/secrets/server.go, internal/adapters/in/grpc/auth/interceptor.go
New gRPC server implementations for Admin, Core, Registry, and Secrets components with full API coverage including route/secret/token management, streaming logs, health checks, and authentication. Auth interceptor provides token validation, scope enforcement, and context enrichment for both unary and streaming RPCs.
HTTP Admin API Removal
internal/adapters/in/http/admin/handler.go, internal/adapters/in/http/admin/handler_test.go, internal/adapters/in/http/admin/middleware.go, internal/adapters/in/http/admin/middleware_test.go
Complete removal of HTTP admin handler, middleware, and associated tests (1112 + 862 + 165 + 362 lines). All admin functionality migrated to gRPC servers.
CLI Remote Client Migration
internal/adapters/in/cli/remote/client.go, internal/adapters/in/cli/remote/auth.go
Refactored remote CLI client from HTTP to gRPC using AdminServiceClient. Adds Close() and VerifyAuth() methods. Replaces all HTTP endpoints with gRPC calls and includes log stream helpers for processing streaming responses.
gRPC Client Adapters
internal/adapters/out/grpccore/client.go, internal/adapters/out/grpcregistry/client.go, internal/adapters/out/grpcsecrets/client.go, internal/adapters/out/grpccore/publisher.go
New gRPC clients implementing TargetResolver, RouteChangeWatcher, RegistryService, and TokenStore/SecretProvider interfaces. Enables inter-component communication and event publishing via gRPC.
Component Entry Points
internal/app/core.go, internal/app/proxy.go, internal/app/registry_entry.go, internal/app/secrets_entry.go, internal/app/lifecycle.go
New component-specific entry points (RunCore, RunProxy, RunRegistry, RunSecrets) replacing monolithic Run function. Lifecycle manager handles sub-container deployment, health monitoring, and auto-restart logic. Core component orchestrates gRPC servers, sub-container management, and event handling.
Application Restructuring
internal/app/run.go, internal/adapters/in/cli/serve.go
Removed monolithic Run function and service orchestration logic. Updated serve command with component-aware routing via --component flag delegating to component entry points.
Proxy Service Refactoring
internal/usecase/proxy/service.go
Added remote resolver support via NewRemoteService to delegate target resolution to gRPC core client. Refactored GetTarget with modular helpers for external routes, container targets, and resolver-based resolution with caching.
Container and Domain Updates
internal/domain/container.go, internal/adapters/out/docker/runtime.go
Extended ContainerConfig with PortBindings, Privileged, and ReadOnlyRoot fields. Docker runtime updated to support explicit port bindings and new container options.
New Boundaries
internal/boundaries/out/target_resolver.go, internal/boundaries/out/route_watcher.go
Added new outbound port interfaces for TargetResolver (domain-to-target resolution) and RouteChangeWatcher (route invalidation streaming).
Integration Tests and Helpers
tests/integration/suite_test.go, tests/integration/01_startup_test.go, tests/integration/02_grpc_communication_test.go, tests/integration/03_image_push_deploy_test.go, tests/integration/04_auto_restart_test.go, tests/integration/05_security_test.go, tests/integration/helpers/..., tests/integration/fixtures/..., tests/integration/README.md
Comprehensive integration test suite covering four-container startup, gRPC communication validation, image push/deploy flow, auto-restart resilience, and security isolation. Includes test helpers for Docker operations, test configuration, and documentation.
Documentation and Configuration
docs/v3-testing.md, internal/testutils/fixtures/configs/basic.toml, tests/integration/fixtures/configs/test-gordon.toml
Added v3 testing guide with architecture overview, quick start, manual steps, debugging, and troubleshooting. Updated test configurations for v3 structure with new server and logging settings.
Dependency Management
go.mod
Added testcontainers-go v0.40.0 direct dependency and updated docker, go-connections, and multiple indirect dependencies for gRPC and container support.

Sequence Diagram(s)

sequenceDiagram
    participant Client as CLI Remote Client
    participant Core as Core Component<br/>(gRPC)
    participant Proxy as Proxy Component<br/>(HTTP)
    participant Secrets as Secrets Component<br/>(gRPC)
    participant Registry as Registry Component<br/>(gRPC)

    Client->>Core: AuthenticatePassword(ctx, Request)
    Core->>Secrets: ValidateToken(ctx)
    Secrets-->>Core: Token claims
    Core-->>Client: AuthenticatePasswordResponse

    Client->>Core: ListRoutes(ctx, Request)
    Core-->>Client: []RouteInfo

    Client->>Core: Deploy(ctx, Request)
    Core->>Proxy: (invalidate routes)
    Core->>Registry: GetManifest(imageName)
    Registry-->>Core: Manifest
    Core-->>Client: DeployResponse
Loading
sequenceDiagram
    participant LM as Lifecycle Manager<br/>(Core)
    participant RT as Container Runtime
    participant Secrets as Secrets Sub-<br/>Container
    participant Registry as Registry Sub-<br/>Container
    participant Proxy as Proxy Sub-<br/>Container

    LM->>LM: DeployAll(ctx)
    LM->>RT: ensureNetwork(gordon-internal)
    RT-->>LM: Network created/exists

    LM->>RT: EnsureRunning(secrets spec)
    RT->>Secrets: Create & Start
    Secrets-->>RT: Container started
    LM->>Secrets: HealthCheck(gRPC port 9091)
    Secrets-->>LM: Healthy

    LM->>RT: EnsureRunning(registry spec)
    RT->>Registry: Create & Start
    LM->>Registry: HealthCheck(HTTP port 5000)
    Registry-->>LM: Healthy

    LM->>RT: EnsureRunning(proxy spec)
    RT->>Proxy: Create & Start
    LM->>Proxy: HealthCheck(HTTP port 8080)
    Proxy-->>LM: Healthy

    Note over LM,Proxy: MonitorLoop runs every 15s<br/>checking health & restarting if needed
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 From monolith's maze, a rabbit did leap,
To components four, where secrets run deep,
gRPC threads weave through core's grand design,
While proxy and registry in networks align,
V3 hops forward, with health checks that gleam! 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Gordon v3 - Sub-Container Architecture with gRPC' directly and comprehensively summarizes the main change—a complete architectural migration from a single-process monolith to a four-component sub-container architecture using gRPC.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch v3-dev

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Gordon v3’s multi-container architecture: a core orchestrator plus proxy, registry, and secrets sub-containers communicating over gRPC, with corresponding integration tests, CLI, Dockerfile, and runtime wiring updates.

Changes:

  • Replaces the monolithic HTTP-based admin/API with a gRPC-based core (CoreService, AdminService, RegistryInspectService, SecretsService) and splits runtime into core, proxy, registry, and secrets components.
  • Adds a lifecycle manager in core to deploy and monitor sub-containers, extends Docker runtime and domain models (container config, networks, ports, security flags), and wires new gRPC in/out adapters.
  • Introduces a comprehensive integration test suite using Testcontainers, new Make targets and scripts, and updates the CLI remote client and Dockerfile to align with the v3 architecture.

Reviewed changes

Copilot reviewed 47 out of 56 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/integration/suite_test.go New shared test suite bootstrapping only core via Testcontainers, discovering sub-containers, wiring gRPC clients, and providing common setup/teardown.
tests/integration/helpers/suite.go Placeholder helpers package doc/comment maintained for compatibility after moving suite logic.
tests/integration/helpers/grpc.go Placeholder for future shared gRPC test helpers, documenting move of existing logic into the suite.
tests/integration/helpers/docker.go Docker-related test helpers: build test image if missing, detect rootless DOCKER_HOST, derive socket path for volume mount.
tests/integration/fixtures/configs/test-gordon.toml Minimal TOML configuration used by integration tests with auth/rate limiting disabled and debug logging.
tests/integration/README.md Documentation for the v3 Testcontainers-based integration suite, test matrix, and troubleshooting.
tests/integration/05_security_test.go Integration test verifying security properties: Docker socket exposure, secrets mounts, network isolation, and proxy privileges.
tests/integration/04_auto_restart_test.go Integration test that kills each sub-container and verifies lifecycle auto-restart and core gRPC health afterwards.
tests/integration/03_image_push_deploy_test.go Integration test covering registry→core gRPC image push notifications, registry inspection, and core routing queries.
tests/integration/02_grpc_communication_test.go Integration test validating gRPC health and basic request flows between core, secrets, registry, and proxy.
tests/integration/01_startup_test.go Integration test ensuring core auto-deploys all three sub-containers and they join the expected network.
scripts/test-v3.sh Local helper script to build the v3 test image, bring up/down a core-based test env, and run basic checks via container engine.
internal/usecase/proxy/service.go Extends proxy service to support a remote (gRPC) target resolver mode, refactors GetTarget into smaller helpers, and keeps external-route SSRF protections.
internal/testutils/fixtures/configs/basic.toml Updates a basic test config to the new v3-style server/logging/auth layout and fields.
internal/grpc/secrets_grpc.pb.go Generated gRPC client/server definitions for SecretsService, including token operations and GetSecret.
internal/grpc/registry_grpc.pb.go Generated gRPC client/server definitions for RegistryInspectService (GetManifest, ListTags, ListRepositories).
internal/grpc/core_grpc.pb.go Generated gRPC client/server definitions for CoreService, including target resolution, routes, image push notification, and route-change streaming.
internal/domain/container.go Expands ContainerConfig with explicit port bindings, privilege flags, and read-only rootfs flag to support more precise container setup.
internal/boundaries/out/target_resolver.go New outbound port describing a gRPC-oriented target resolver and route listing interface for proxy.
internal/boundaries/out/route_watcher.go New outbound port for watching route invalidation events (e.g., for proxy cache eviction).
internal/app/secrets_entry.go New entrypoint wiring the standalone secrets component: config, token store, secret providers, gRPC server, and health service.
internal/app/run.go Removes the old monolithic Run path and related HTTP server wiring, leaving shared config/logger and signal utilities used by new components.
internal/app/registry_entry.go New entrypoint wiring the standalone registry component (filesystem storage, gRPC inspect server, HTTP registry API, and core event publisher).
internal/app/proxy.go New entrypoint wiring the standalone proxy component as a pure gRPC client of core, exposing HTTP on a configurable port with a health endpoint.
internal/app/lifecycle.go New lifecycle manager that ensures creation, health checking, and auto-restart of proxy/registry/secrets sub-containers on a managed Docker network.
internal/app/core.go New core entrypoint wiring Docker runtime, config, containers, health/log/secrets services, lifecycle manager, gRPC Core/Admin servers, and HTTP admin/registry surface.
internal/adapters/out/grpcsecrets/client.go New gRPC client for the secrets service implementing both TokenStore and SecretProvider outbound ports.
internal/adapters/out/grpcregistry/client.go New gRPC client enabling core to inspect manifests/tags/repositories via gordon-registry.
internal/adapters/out/grpccore/publisher.go New gRPC-based EventPublisher that forwards registry image-pushed events to core’s NotifyImagePushed.
internal/adapters/out/grpccore/client.go New gRPC client for core implementing TargetResolver and RouteChangeWatcher for the proxy component.
internal/adapters/out/docker/runtime.go Extends Docker runtime CreateContainer to honor explicit port bindings, host/container bind semantics, and privileged/readonly options.
internal/adapters/in/http/admin/middleware_test.go Removes HTTP admin middleware tests now that admin is exposed via gRPC rather than HTTP.
internal/adapters/in/http/admin/middleware.go Removes HTTP admin auth middleware; auth is now enforced by the gRPC interceptor.
internal/adapters/in/grpc/secrets/server.go New SecretsService gRPC server wrapping a TokenStore and multiple SecretProviders.
internal/adapters/in/grpc/registry/server.go New RegistryInspectService gRPC server wrapping the existing registry use case.
internal/adapters/in/grpc/core/server.go New CoreService gRPC server implementing GetTarget, routes, image-push notification handling, and route-change streaming hooked to the event bus.
internal/adapters/in/grpc/auth/interceptor.go New gRPC auth interceptor enforcing admin JWT validation and scope checks for AdminService methods (excluding the password auth endpoint).
internal/adapters/in/cli/serve.go Refactors gordon serve to require a --component flag (core
internal/adapters/in/cli/remote/auth.go Switches remote CLI authentication from HTTP /auth/password to the gRPC AdminService AuthenticatePassword endpoint.
gordon.toml.example Adds a v3-oriented example configuration with server, auth, route, external route, network, attachment, and volume settings.
go.mod Updates dependencies to include gRPC, protobuf, testcontainers-go, and associated indirects, and adjusts some Docker/moby versions.
docs/v3-testing.md New documentation for running and validating the v3 multi-container architecture locally and in CI.
buf.yaml Buf module configuration pointing at api/proto and enabling default lint/breaking rules.
buf.gen.yaml Buf generate configuration for protoc-go and protoc-go-grpc to output into internal/grpc.
Makefile Bumps image tags to v3, adds proto generation and several integration test targets (build, run, quick, clean).
Dockerfile Reworks Docker build to a v3 multi-component image with ENTRYPOINT-based component selection, added tools, and component-aware health checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +270 to +274
**Solution**: Check Docker socket permissions
```bash
ls -la /var/run/docker.sock
# If using Podman on Linux, you may need:
sudo chmod 666 /var/run/docker.sock
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs recommend sudo chmod 666 /var/run/docker.sock, which makes the Docker/Podman socket world-writable and effectively grants root-equivalent access to any local user. On multi-user systems this is a serious privilege-escalation risk, since anyone can start containers, read mounted secrets, or control the daemon. Instead, keep the socket permissions restricted and fix access issues by configuring the appropriate user/group or using the correct rootless socket path rather than relaxing filesystem permissions.

Suggested change
**Solution**: Check Docker socket permissions
```bash
ls -la /var/run/docker.sock
# If using Podman on Linux, you may need:
sudo chmod 666 /var/run/docker.sock
**Solution**: Check Docker/Podman socket permissions
```bash
# Inspect the Docker/Podman socket
ls -la /var/run/docker.sock
# If using Docker and you see "permission denied", prefer adding your user to the
# docker group instead of changing socket permissions:
# sudo usermod -aG docker "$USER"
# newgrp docker # or log out and log back in
#
# Alternatively, run Docker commands with sudo:
# sudo docker ps
#
# For rootless Podman, use the user socket instead of the system socket, e.g.:
# export DOCKER_HOST="unix://$XDG_RUNTIME_DIR/podman/podman.sock"
# and ensure your Podman service is running:
# systemctl --user status podman.socket

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 38

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/adapters/in/cli/serve.go (1)

55-73: ⚠️ Potential issue | 🟡 Minor

Remove unused configPath variable in deprecated start command.

The configPath variable on Line 57 is declared and bound to a flag but never used since the RunE function only returns an error. This is dead code that should be removed.

🧹 Proposed fix
 // newStartCmd creates a deprecated alias for serve.
 func newStartCmd() *cobra.Command {
-	var configPath string
-
 	cmd := &cobra.Command{
 		Use:        "start",
 		Short:      "Start the Gordon server (deprecated: use 'serve')",
 		Long:       `Start the Gordon server. This command is deprecated, please use 'gordon serve' instead.`,
 		Deprecated: "use 'gordon serve --component=<core|proxy|registry|secrets>' instead",
 		Hidden:     true, // Hide from help but still works
 		RunE: func(cmd *cobra.Command, args []string) error {
 			return fmt.Errorf("'gordon start' is no longer supported. Use 'gordon serve --component=<core|proxy|registry|secrets>' instead")
 		},
 	}
 
-	// Add flags (same as serve)
-	cmd.Flags().StringVarP(&configPath, "config", "c", "", "Path to config file")
-
 	return cmd
 }
🤖 Fix all issues with AI agents
In `@buf.gen.yaml`:
- Around line 9-19: Update the two plugin remote version strings under the
plugins block: change "buf.build/protocolbuffers/go:v1.36.4" to
"buf.build/protocolbuffers/go:v1.36.11" and change "buf.build/grpc/go:v1.5.1" to
"buf.build/grpc/go:v1.6.0", keeping the existing out and opt entries (e.g., the
paths=import and require_unimplemented_servers=false options) unchanged so only
the remote versions are bumped.

In `@Dockerfile`:
- Around line 39-48: The Dockerfile RUN line currently installs
"grpc-health-probe" via apk which doesn't exist; change the Dockerfile so the
RUN step removes "grpc-health-probe" from the apk add list and instead downloads
the multi-arch grpc-health-probe binary (detecting architecture via uname -m or
TARGETARCH) from the GitHub releases, verifies it (e.g., make it executable and
optionally checksum), places it into /usr/local/bin, and cleans up; update the
RUN sequence that installs ca-certificates, docker-cli, curl, wget, tzdata,
pass, gnupg to no longer include grpc-health-probe and add the download+install
commands for grpc-health-probe in the same Dockerfile RUN layer to keep image
size minimal.

In `@docs/v3-testing.md`:
- Around line 108-114: The sample output shows both gordon-core-test and
gordon-registry-test mapping host port 5000 which looks like a host port
conflict; update the docs to either (a) change the displayed host port mapping
for one container so they are distinct in the example output (e.g.,
gordon-core-test -> 9090, gordon-registry-test -> 5001) or (b) add a clarifying
sentence next to the example explaining that one of the 5000 mappings is to a
different host port or namespace and not a real conflict; reference the entries
named gordon-core-test and gordon-registry-test in the docs/v3-testing.md
example when making the change.
- Around line 270-276: Replace the insecure "sudo chmod 666
/var/run/docker.sock" recommendation with instructions to add the user to the
docker group and re-evaluate the session (e.g., use "sudo usermod -aG docker
$USER" and then log out and back in or use "newgrp docker"); update the text in
the docs/v3-testing.md snippet that currently suggests chmod 666 so it instead
shows the safe alternative and removes the chmod line, and ensure the
surrounding explanation mentions the need to re-login or newgrp to apply group
changes.

In `@gordon.toml.example`:
- Around line 31-36: The password field is ambiguous—clarify whether "password"
expects a plaintext secret or a precomputed hash: update the comment near the
password line (and the openssl suggestion) to explicitly state the expected
format; if plaintext is expected, change the openssl note to recommend
generating a random password (e.g., "openssl rand -base64 32") and state that
the value should be the raw password, not a hash; if a hashed password is
required, state the hash algorithm and verification method and show an example
hash format instead. Ensure you reference the "password" key and the existing
openssl suggestion so the example and comment are consistent.

In `@internal/adapters/in/cli/remote/client.go`:
- Around line 79-85: Close() currently reads and closes c.conn without
synchronizing with c.mu, causing races with ensureConn(); fix by acquiring c.mu
at start, check if c.conn is nil, if not move the pointer to a local variable
and set c.conn and c.admin to nil while still holding the lock, then release the
lock and call localConn.Close() (returning its error). This ensures ensureConn()
sees nil and will reconnect, avoids holding the mutex while doing the blocking
Close(), and references the existing Close(), ensureConn(), c.conn, c.admin, and
c.mu symbols.

In `@internal/adapters/in/grpc/admin/server.go`:
- Around line 228-252: SetSecrets and DeleteSecret currently map every secretSvc
error to "invalid domain", which hides the real failure; update both methods
(SetSecrets and DeleteSecret) to propagate the underlying error details in the
gRPC status response instead of a fixed message—e.g., inspect the error returned
by secretSvc.Set and secretSvc.Delete and convert it to an appropriate gRPC
status (using status.Errorf/status.FromError or including err.Error() in the
message) so callers see the actual failure reason while preserving the proper
gRPC code mapping.
- Around line 208-226: The handler currently maps any error from
s.secretSvc.ListKeysWithAttachments(ctx, req.Domain) to a misleading "invalid
domain" message; update the ListSecrets handler to propagate a proper gRPC error
by inspecting or wrapping the returned err from
s.secretSvc.ListKeysWithAttachments (e.g., map known sentinel errors to
appropriate codes or return
status.Errorf(codes.Internal/PermissionDenied/Unavailable, "%v", err)) instead
of the hardcoded "invalid domain" string so callers see the real failure reason;
modify the error return at the call site where keys, attachments, err :=
s.secretSvc.ListKeysWithAttachments(...) is checked to use
status.Errorf/status.FromContext/appropriate mapping and include err.Error() in
the message.

In `@internal/adapters/in/grpc/core/server.go`:
- Around line 307-320: The two isRunningInContainer implementations diverge;
make them consistent or avoid sharing; update Server.isRunningInContainer to
match the fuller logic used in the service implementation by adding the hostname
pattern check and environment-variable checks and include podman detection and
the same string search approach (use strings.Contains on /proc/1/cgroup) so both
implementations (Server.isRunningInContainer and the service's
isRunningInContainer) perform the same four checks (/.dockerenv, /proc/1/cgroup
with docker/containerd/podman, hostname pattern, container-related env vars) or
alternatively centralize the logic into a single shared helper used by both
functions.
- Around line 322-328: getProxyPort currently ignores the image parameter and
returns hardcoded 8080; update Server.getProxyPort to inspect the image's labels
(look for "gordon.proxy.port") via the Server's runtime interface (e.g.,
s.runtime.ImageInspect / equivalent), parse and return that port if present,
otherwise fall back to the image's exposed ports (return the first valid exposed
port), and only use 8080 as the final default; ensure you handle parsing errors
and missing data and reference getProxyPort and the Server.runtime inspection
methods when implementing the label-and-exposed-port lookup similar to the logic
used in proxy/service.go:467.

In `@internal/adapters/in/grpc/registry/server.go`:
- Around line 38-64: The code in GetManifest and ListTags currently wraps every
error as codes.NotFound; update both handlers (Server.GetManifest and
Server.ListTags) to inspect the returned error from s.registrySvc (e.g., using
errors.Is(err, registry.ErrNotFound) or the service's sentinel error) and only
return status.Error(codes.NotFound, ...) when the error indicates a missing
resource; for any other unexpected error return status.Errorf(codes.Internal,
"internal error fetching manifest for %s:%s: %v", req.Name, req.Reference, err)
(and similarly for ListTags use an Internal code with context), preserving the
original error details in the log/message.

In `@internal/adapters/in/grpc/secrets/server.go`:
- Around line 129-168: The converters domainToProtoToken and protoToDomainToken
currently drop Token.Metadata and duplicate conversion logic; update both to
copy the metadata map (preserve nil vs empty) — in domainToProtoToken set
protoToken.Metadata = make(map[string]string,len(t.Metadata)) and copy entries
from t.Metadata, and in protoToDomainToken set token.Metadata =
make(map[string]string,len(t.Metadata)) and copy entries from t.Metadata; then
refactor these two functions into a single shared converter (or a small shared
helper like tokenConverters.ConvertDomainToProto / ConvertProtoToDomain) to
avoid duplication and ensure consistent behavior and nil/empty handling across
client/server code.
- Around line 51-54: The current error handling in the provider.GetSecret and
provider.GetToken call sites returns codes.NotFound for any backend error;
change both call sites to inspect the error using errors.Is(err,
domain.ErrSecretNotFound) and errors.Is(err, domain.ErrTokenNotFound)
respectively and only return status.Errorf(codes.NotFound, ...) when the
sentinel error matches; for any other error return status.Errorf(codes.Internal,
"failed to get secret/token at path %s: %v", req.Path, err) so
I/O/permission/backend failures are surfaced as Internal. Ensure you import and
use errors.Is and reference domain.ErrSecretNotFound and domain.ErrTokenNotFound
in the checks for the GetSecret and GetToken call sites.
- Around line 62-127: SaveToken, RevokeToken, IsRevoked and DeleteToken
currently accept missing inputs and can panic or behave incorrectly; add early
validation and return status.Error(codes.InvalidArgument, ...) when required
fields are missing. Specifically: in SaveToken verify req.Token != nil and
req.Jwt != "" (and optionally validate req.Token.Id/Subject if your store
requires them) before calling s.tokenStore.SaveToken; in RevokeToken and
IsRevoked verify req.TokenId != "" before calling s.tokenStore.Revoke /
s.tokenStore.IsRevoked; in DeleteToken verify req.Subject != "" before calling
s.tokenStore.DeleteToken. Use the same error pattern as GetToken for
InvalidArgument to fail fast.

In `@internal/adapters/out/grpccore/client.go`:
- Around line 191-193: The HealthCheck method can panic if c.conn is nil; update
Client.HealthCheck to defensively check for a nil connection before calling
c.conn.GetState() (similar to the nil guard used in Close()), returning false
when c.conn is nil and otherwise returning the readiness check via
c.conn.GetState() == connectivity.Ready; reference the Client.HealthCheck method
and the Close method's nil-check pattern when making the change.
- Around line 41-58: The readiness loop around
conn.GetState()/conn.WaitForStateChange can hang if the connection enters a
terminal state; update the loop in client.go to explicitly check for terminal
connectivity states (e.g., connectivity.Shutdown and optionally
connectivity.TransientFailure) after calling conn.GetState(), and on those
states call conn.Close() and return a descriptive error (including coreAddr and
context) instead of waiting; reference the existing conn variable and the
GetState/WaitForStateChange calls to locate where to insert the checks so the
client fails fast on terminal connection states.
- Line 140: The Watch method assigns the onInvalidate callback to the struct
field c.onInvalidate which is unnecessary and not thread-safe; remove the
assignment (delete or stop referencing c.onInvalidate inside Watch) and use the
local onInvalidate parameter directly within Watch, then remove the onInvalidate
field from the client struct (and any other references) if it becomes unused so
the code no longer relies on a shared mutable field; ensure Watch and any helper
goroutines capture the local onInvalidate parameter where needed.

In `@internal/adapters/out/grpcregistry/client.go`:
- Around line 38-45: The current logic checks conn.GetState() once and calls
conn.WaitForStateChange(ctx, state) only once, which can return for any state
transition (not necessarily connectivity.Ready); update the code around
conn.GetState(), conn.WaitForStateChange(ctx, state), and connectivity.Ready to
loop until the state is connectivity.Ready or a timeout/context cancellation
occurs: repeatedly call state = conn.GetState(), if state == connectivity.Ready
break, otherwise call conn.WaitForStateChange(ctx, state) and if it returns
false (timeout/cancel) close conn and return the timeout error, continuing the
loop on true so only proceeding when Ready.

In `@internal/adapters/out/grpcsecrets/client.go`:
- Around line 46-50: The WithProvider method currently mutates c.providerName on
the receiver which causes data races when the same Client instance is reused;
change WithProvider to create a shallow copy of the receiver (e.g., copy := *c),
set copy.providerName = name, and return &copy so the original Client is not
modified; update the signature use of Client.WithProvider and any call sites if
needed to handle the returned pointer.
- Around line 146-185: domainToProtoToken and protoToDomainToken are dropping
token Metadata; update domainToProtoToken to copy t.Metadata into
protoToken.Metadata instead of creating an empty map, and update
protoToDomainToken to populate domain.Token.Metadata from t.Metadata when
present; modify functions domainToProtoToken and protoToDomainToken to preserve
the Metadata map between domain.Token and gordon.Token (ensuring nil vs empty
map semantics are handled consistently).
- Around line 26-35: The NewClient function currently constructs a gRPC client
with insecure.NewCredentials() and returns an error that implies a connection
attempt; update NewClient to use TLS or mTLS credentials instead of
insecure.NewCredentials() (e.g., build credentials via
credentials.NewClientTLSFromFile(...) or
credentials.NewTLS(&tls.Config{Certificates:..., RootCAs:..., ServerName:...})
depending on whether you want server-only TLS or mutual TLS), replace the
misleading error text around grpc.NewClient(...) to indicate client
creation/credential construction failed (not "failed to connect"), and keep the
actual RPC-level timeouts/connection reliability to callers (use
context.WithTimeout in RPCs) rather than trying to verify connectivity at
NewClient; reference symbols: NewClient, grpc.NewClient,
insecure.NewCredentials, Client, and gordon.NewSecretsServiceClient.

In `@internal/app/lifecycle.go`:
- Around line 329-343: The checkHealth method on LifecycleManager currently
ignores the component parameter; either implement component-specific checks or
remove the unused parameter. Update the function signature of checkHealth (and
any callers) to remove component if per-component checks are not needed, or
implement branching inside checkHealth: after locating the container (using
lm.runtime.ListContainers and container name match), if component == "grpc" (or
other known components) perform the appropriate probe (e.g., call a gRPC health
endpoint or run lm.runtime.IsContainerRunning(ctx, c.ID) combined with an
in-container health check), otherwise fall back to
lm.runtime.IsContainerRunning(ctx, c.ID); ensure callers reflect the chosen
signature and that the unused parameter is eliminated if removed.
- Around line 354-366: The loop in waitForHealth incorrectly passes the original
ctx to lm.checkHealth, so health probes don't honor the 30s timeout; change the
call in the ticker branch to use timeoutCtx instead of ctx (i.e., call
lm.checkHealth(timeoutCtx, name, component)) and ensure any related
cancellations/returns still use ctx for upstream cancellation handling.
- Around line 176-199: EnsureRunning currently calls handleExistingContainer but
always proceeds to create a new container; change handleExistingContainer to
return (bool, error) indicating "already running" and update EnsureRunning to
check that boolean and return early when true. Specifically, modify
handleExistingContainer to return true,nil when the existing container is
already correct, and in EnsureRunning (which calls buildContainerConfig,
runtime.CreateContainer and runtime.StartContainer) use that boolean to skip
creating/starting a new container and return nil immediately if true; otherwise
proceed to create and start as before. Ensure all callers and error handling are
adjusted to the new (bool,error) signature.

In `@internal/app/proxy.go`:
- Around line 68-71: The RegistryPort in the proxy.NewRemoteService call is
hardcoded to 5000; update proxy.Config so RegistryPort is read from an
environment variable (use getEnvOrDefault or similar) and parsed into an int
before passing it in, falling back to 5000 on parse error or empty value. Locate
the call to proxy.NewRemoteService and the proxy.Config struct usage and replace
the literal 5000 with the parsed value (use
getEnvOrDefault("GORDON_REGISTRY_PORT", "5000") and strconv.Atoi or
strconv.ParseInt with error handling), and add the "strconv" import to the file.

In `@internal/app/secrets_entry.go`:
- Around line 56-71: The providers slice is hardcoded to only include
secretsAdapter.NewPassProvider while the configured backend (variable backend
from domain.SecretsBackend) is ignored, causing GetSecret to fail for
provider="sops"; update the wiring so providers are registered according to the
configured backend (same value passed to tokenstore.NewStore). Concretely, use
the backend variable to conditionally append providers to the providers slice
(e.g., include secretsAdapter.NewSopsProvider(log) when backend equals the SOPS
backend constant from domain, otherwise include PassProvider), or alternatively
always register both secretsAdapter.NewPassProvider and
secretsAdapter.NewSopsProvider; ensure the providers variable is built before
use so GetSecret can find the requested provider.

In `@internal/usecase/proxy/service.go`:
- Around line 75-81: The NewRemoteService constructor currently accepts a
watcher out.RouteChangeWatcher but never stores or uses it; either remove the
watcher parameter from NewRemoteService signature or save it on the Service
struct and wire it up for route invalidation. To keep watcher functionality: add
a watcher field to the Service struct, assign watcher in NewRemoteService, and
register the watcher's callback (or start a goroutine) to call the Service's
cache invalidation method (e.g., the method handling target/route refreshes)
when routes change. If watcher is unnecessary, remove the watcher parameter from
NewRemoteService and any references in callers. Ensure references use the
Service type, NewRemoteService, and out.RouteChangeWatcher symbols to locate
changes.

In `@Makefile`:
- Around line 86-110: The integration Makefile targets use the literal "docker"
binary which breaks non-docker engines; update all container engine invocations
in the targets test-integration-clean, test-integration-build, test-integration,
and test-integration-quick to use the existing variable $(ENGINE) instead of
hardcoded "docker" (e.g., replace docker stop/rm/network/prune/ps/build/pull
with $(ENGINE) stop/rm/network/prune/ps/build/pull) so the targets respect the
configurable ENGINE (defined ?= podman).

In `@scripts/test-v3.sh`:
- Around line 36-47: The build() function writes to dist/gordon but never
ensures the dist directory exists; before running go build -o dist/gordon in
build(), create the output directory (e.g., run a mkdir -p dist equivalent) so
dist exists and the binary can be written to dist/gordon, leaving the rest of
the steps (cp/rm and image build) unchanged.
- Around line 62-66: The script uses the Podman-only command "image exists" with
the ENGINE/TEST_IMAGE check; replace that check with a portable "image inspect"
invocation that works for both Docker and Podman. Concretely, change the
conditional that calls "$ENGINE image exists $TEST_IMAGE" to run "$ENGINE image
inspect \"$TEST_IMAGE\" >/dev/null 2>&1" and negate that (if ! $ENGINE image
inspect ...); keep the existing log_warn "Test image not found, building..." and
call to build so the build path remains unchanged.

In `@tests/integration/01_startup_test.go`:
- Around line 42-49: Add an assertion after retrieving coreNetworks to ensure
"gordon-internal" is not present: after calling s.CoreC.NetworkAliases(s.ctx)
and logging coreNetworks, add s.Require().NotContains(coreNetworks,
"gordon-internal", "gordon-core should not be on gordon-internal network") (or
s.Assert().NotContains if you prefer a non-fatal check) to enforce the comment's
security expectation.

In `@tests/integration/02_grpc_communication_test.go`:
- Around line 68-106: For testCoreToSecrets and testCoreToRegistry, make the
missing-item checks resilient to gRPC NotFound by accepting either a successful
RPC (err == nil) or a NotFound status; update the GetToken and GetManifest call
handling in testCoreToSecrets and testCoreToRegistry so you check if err == nil
then assert the response shape (e.g., tokenResp.Found == false or
manifestResp.Manifest == nil), otherwise assert status.Code(err) ==
codes.NotFound and treat that as a passing outcome; import and use grpc/status
and codes where needed and guard against nil response objects when err is
NotFound (SecretsClient.GetToken, RegistryClient.GetManifest).

In `@tests/integration/04_auto_restart_test.go`:
- Around line 139-167: The test only checks core health but doesn't re-establish
connections to services whose host ports may have changed; after calling
refreshAllContainerRefs() in verifyGRPCConnectivityRestored, look up the new
mapped host:port from s.SecretsC and s.RegistryC, close any existing
s.secretsConn and s.registryConn, dial fresh gRPC connections for secrets and
registry (similar to how s.coreConn is created), create health clients via
grpc_health_v1.NewHealthClient(...) for each, run Health.Check with a timeout
and assert SERVING, and set the renewed connections back on the suite
(s.secretsConn, s.registryConn) so subsequent tests use the updated connections.
- Around line 87-92: The ContainerList call uses the long-lived s.ctx and can
hang; replace s.dockerClient.ContainerList(s.ctx, ...) with the scoped timeout
context created earlier (e.g., ctx or timeoutCtx) so the 45-second test timeout
is honored; ensure you pass that timeout context into ContainerList and any
other Docker calls in this polling loop (reference: s.dockerClient.ContainerList
and the scoped timeout context variable created near the test setup).

In `@tests/integration/helpers/docker.go`:
- Around line 35-43: The tests use hardcoded relative dirs ("../..") for
buildCmd.Dir and dockerCmd.Dir which breaks when tests run from other working
directories; update the setup to compute the repository root dynamically (e.g.,
via runtime.Caller to locate the file path or an overriding env var) and set
buildCmd.Dir and dockerCmd.Dir to that absolute repo root so buildCmd, dockerCmd
and the TestImageName docker build always run from the correct project root.
- Around line 28-32: The current ImageInspect call treats any error as "image
not found" and always rebuilds; change the error handling around the
cli.ImageInspect(ctx, TestImageName) call to distinguish not-found from other
failures by using errdefs.IsNotFound(err) (from Docker's errdefs package) — if
IsNotFound(err) is true, proceed with the rebuild path, otherwise return or
propagate the actual error (or log/wrap it) so network/permission errors are not
misinterpreted as missing image; update the block that currently checks "if err
== nil { ... }" to explicitly handle err == nil, errdefs.IsNotFound(err), and
the fallback error case.
- Around line 63-70: GetDockerSocketPath currently treats any DOCKER_HOST as a
filesystem socket by only stripping a "unix://" prefix; change it to detect
scheme from DetectDockerHost(): if the host begins with "unix://" return the
path portion (host[7:]), but if it begins with any other scheme (e.g., "tcp://")
or contains "://" return an empty string to indicate there is no Unix socket to
mount; ensure callers of GetDockerSocketPath handle the empty string case rather
than trying to mount a non-unix URL.

In `@tests/integration/README.md`:
- Around line 43-49: Fix the markdownlint issues in the README by: adding a
single space on both sides of each pipe within the table rows/headers (the table
block shown in the diff) to satisfy MD060; specifying a language identifier
(e.g., ```bash or ```sh) for the code fence referenced around line 55 to satisfy
MD040; replacing the bare URL with a proper Markdown link [text](https://...)
where the URL appears to satisfy MD034; and ensuring there are blank lines
before and after all headings and fenced code blocks (the sections noted around
lines 95–111 and 68–73) to address MD022/MD031. Ensure these edits preserve
content but fix spacing and fence syntax so markdownlint errors are resolved.

Comment on lines +9 to +19
plugins:
- remote: buf.build/protocolbuffers/go:v1.36.4
out: internal/grpc
opt:
- paths=import

- remote: buf.build/grpc/go:v1.5.1
out: internal/grpc
opt:
- paths=import
- require_unimplemented_servers=false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

buf.build protocolbuffers go latest version 2025 2026

💡 Result:

As of February 1, 2026, the latest upstream Go Protobuf release (the protoc-gen-go / google.golang.org/protobuf tool+runtime that Buf’s buf.build/protocolbuffers/go tracks) is v1.36.11 (released December 12, 2025). [1]

So, on Buf Schema Registry:

  • Latest remote plugin version to pin: buf.build/protocolbuffers/go:v1.36.11 (use this in buf.gen.yaml). [2][1]
  • If you omit :version, Buf uses the latest: remote: buf.build/protocolbuffers/go. [2]

Sources: [1] [2]


🌐 Web query:

buf.build grpc go latest version 2025 2026

💡 Result:

As of February 1, 2026, the latest Buf Schema Registry (BSR) remote plugin for generating gRPC-Go stubs is referenced as:

  • buf.build/grpc/go (latest, unpinned)

Buf’s docs note that if you omit the version, Buf will use the latest plugin version automatically. [1]

If you want to pin the plugin version, the most recent upstream protoc-gen-go-grpc release tag shown in the gRPC-Go repo is:

  • cmd/protoc-gen-go-grpc/v1.6.0 (dated Nov 27, year shown on the releases page listing) [2]

The gRPC-Go library itself shows a newer overall release tag:

  • v1.78.0 (dated Dec 23, year shown on the releases page listing) [2]

Typical buf.gen.yaml (use latest):

version: v2
plugins:
  - remote: buf.build/protocolbuffers/go
    out: gen/go
    opt: paths=source_relative
  - remote: buf.build/grpc/go
    out: gen/go
    opt: paths=source_relative

[1]

Pinned form (example):

- remote: buf.build/grpc/go:v1.6.0
  out: gen/go
  opt: paths=source_relative

(version number based on the latest protoc-gen-go-grpc tag listed above) [2]

Sources: [1] [2]


Update plugin versions to latest stable releases.

The configuration uses buf.build/protocolbuffers/go:v1.36.4 and buf.build/grpc/go:v1.5.1, which are outdated. Update to buf.build/protocolbuffers/go:v1.36.11 (released December 2025) and buf.build/grpc/go:v1.6.0 (released November 2025) to include the latest bug fixes and improvements.

🤖 Prompt for AI Agents
In `@buf.gen.yaml` around lines 9 - 19, Update the two plugin remote version
strings under the plugins block: change "buf.build/protocolbuffers/go:v1.36.4"
to "buf.build/protocolbuffers/go:v1.36.11" and change "buf.build/grpc/go:v1.5.1"
to "buf.build/grpc/go:v1.6.0", keeping the existing out and opt entries (e.g.,
the paths=import and require_unimplemented_servers=false options) unchanged so
only the remote versions are bumped.

Comment on lines 39 to 48
RUN apk add --no-cache \
ca-certificates \
docker-cli \
curl \
wget \
tzdata \
pass \
gnupg \
grpc-health-probe \
&& rm -rf /var/cache/apk/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

grpc-health-probe alpine linux apk package

💡 Result:

grpc-health-probe (the grpc_health_probe binary from grpc-ecosystem/grpc-health-probe) is not packaged as an official Alpine apk (there’s no grpc-health-probe entry in Alpine’s package index; the closest similarly-named package is grpc-health-check, which is a different project). [1]

Recommended options on Alpine:

  1. Download the prebuilt static binary from GitHub Releases (common in Alpine-based images): the project publishes grpc_health_probe-linux-amd64, ...-arm64, etc. [2]
    Example (amd64):
apk add --no-cache wget ca-certificates
VER=v0.4.24
wget -qO /usr/local/bin/grpc_health_probe \
  https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/$VER/grpc_health_probe-linux-amd64
chmod +x /usr/local/bin/grpc_health_probe
  1. Build/install with Go:
apk add --no-cache go git
go install github.com/grpc-ecosystem/grpc-health-probe@latest

(This is the project’s documented “install from source” path.) [2]

Note: If you specifically meant an Alpine apk for something that does gRPC health checks, Alpine does have grpc-health-check in edge/testing, but it is not grpc-health-probe. [1]

Sources: Alpine package index (grpc-health-check) [1]; grpc-health-probe upstream install/release info [2].

[1] (pkgs.alpinelinux.org)
[2] (github.com)

Citations:


🏁 Script executed:

# Check if there are any references to grpc-health-probe in the codebase
rg "grpc-health-probe" --type-list | head -20
rg "grpc-health-probe"

Repository: bnema/gordon

Length of output: 616


🌐 Web query:

grpc-health-probe github releases download binary latest version

💡 Result:

Latest grpc-health-probe release on GitHub is v0.4.42 (published 2025-11-12). [1]

Download the latest binary (direct URL pattern)

Binaries are published as release assets named like:

grpc_health_probe-<os>-<arch>

Direct download URL pattern:

https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.42/grpc_health_probe-<os>-<arch>

Examples:

  • Linux x86_64:
curl -L -o grpc_health_probe \
  https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.42/grpc_health_probe-linux-amd64
chmod +x grpc_health_probe
  • macOS Apple Silicon:
curl -L -o grpc_health_probe \
  https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.42/grpc_health_probe-darwin-arm64
chmod +x grpc_health_probe

The release also includes a checksums.txt asset for verifying downloads. [1]

If you just want the “latest” release page (to pick your OS/arch asset), it’s here: Releases → v0.4.42. [1]

Sources: GitHub Releases page for grpc-ecosystem/grpc-health-probe. [1]


🏁 Script executed:

cat -n Dockerfile | head -60

Repository: bnema/gordon

Length of output: 1888


Fix build break: grpc-health-probe isn't in Alpine repos.
CI reports apk add failing because the package doesn't exist. Download the binary instead (and respect multi-arch).

🛠️ Suggested fix (download grpc-health-probe binary)
 RUN apk add --no-cache \
     ca-certificates \
     docker-cli \
     curl \
     wget \
     tzdata \
     pass \
-    gnupg \
-    grpc-health-probe \
+    gnupg \
     && rm -rf /var/cache/apk/*
+
+ARG GRPC_HEALTH_PROBE_VERSION=v0.4.42
+ARG TARGETARCH
+RUN wget -qO /usr/local/bin/grpc_health_probe \
+    "https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-${TARGETARCH}" \
+ && chmod +x /usr/local/bin/grpc_health_probe
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN apk add --no-cache \
ca-certificates \
docker-cli \
curl \
wget \
tzdata \
pass \
gnupg \
grpc-health-probe \
&& rm -rf /var/cache/apk/*
RUN apk add --no-cache \
ca-certificates \
docker-cli \
curl \
wget \
tzdata \
pass \
gnupg \
&& rm -rf /var/cache/apk/*
ARG GRPC_HEALTH_PROBE_VERSION=v0.4.42
ARG TARGETARCH
RUN wget -qO /usr/local/bin/grpc_health_probe \
"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-${TARGETARCH}" \
&& chmod +x /usr/local/bin/grpc_health_probe
🧰 Tools
🪛 GitHub Actions: CI

[error] 39-39: Docker build failed: APK package 'grpc-health-probe' not found. Command in Dockerfile RUN apk add --no-cache ca-certificates docker-cli curl wget tzdata pass gnupg grpc-health-probe failed with exit code 1.

🪛 Hadolint (2.14.0)

[warning] 39-39: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>

(DL3018)

🤖 Prompt for AI Agents
In `@Dockerfile` around lines 39 - 48, The Dockerfile RUN line currently installs
"grpc-health-probe" via apk which doesn't exist; change the Dockerfile so the
RUN step removes "grpc-health-probe" from the apk add list and instead downloads
the multi-arch grpc-health-probe binary (detecting architecture via uname -m or
TARGETARCH) from the GitHub releases, verifies it (e.g., make it executable and
optionally checksum), places it into /usr/local/bin, and cleans up; update the
RUN sequence that installs ca-certificates, docker-cli, curl, wget, tzdata,
pass, gnupg to no longer include grpc-health-probe and add the download+install
commands for grpc-health-probe in the same Dockerfile RUN layer to keep image
size minimal.

Comment on lines +108 to +114
```
NAMES STATUS PORTS
gordon-core-test Up 30 seconds 0.0.0.0:9090->9090, 0.0.0.0:5000->5000
gordon-proxy-test Up 25 seconds 0.0.0.0:80->80
gordon-registry-test Up 25 seconds 0.0.0.0:5000->5000
gordon-secrets-test Up 25 seconds
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Potential port conflict in expected output documentation.

Both gordon-core-test and gordon-registry-test show port 5000 mapped. While this may be intentional (core admin API vs registry HTTP), it could confuse users since they appear to conflict on the same host. Consider clarifying that these containers use different host port mappings or update the expected output to show actual mapped ports.

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 108-108: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 108-108: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@docs/v3-testing.md` around lines 108 - 114, The sample output shows both
gordon-core-test and gordon-registry-test mapping host port 5000 which looks
like a host port conflict; update the docs to either (a) change the displayed
host port mapping for one container so they are distinct in the example output
(e.g., gordon-core-test -> 9090, gordon-registry-test -> 5001) or (b) add a
clarifying sentence next to the example explaining that one of the 5000 mappings
is to a different host port or namespace and not a real conflict; reference the
entries named gordon-core-test and gordon-registry-test in the
docs/v3-testing.md example when making the change.

Comment on lines +270 to +276
**Solution**: Check Docker socket permissions
```bash
ls -la /var/run/docker.sock
# If using Podman on Linux, you may need:
sudo chmod 666 /var/run/docker.sock
```

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Security concern: Avoid recommending chmod 666 on Docker socket.

Setting chmod 666 on the Docker socket makes it world-readable/writable, which is a significant security risk as any user or process on the system can control Docker.

📝 Recommended alternative
 ### Issue: Containers fail to start
 
 **Solution**: Check Docker socket permissions
 ```bash
 ls -la /var/run/docker.sock
-# If using Podman on Linux, you may need:
-sudo chmod 666 /var/run/docker.sock
+# Ensure your user is in the docker group:
+sudo usermod -aG docker $USER
+# Then log out and back in, or run:
+newgrp docker
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.20.0)</summary>

[warning] 271-271: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In @docs/v3-testing.md around lines 270 - 276, Replace the insecure "sudo chmod
666 /var/run/docker.sock" recommendation with instructions to add the user to
the docker group and re-evaluate the session (e.g., use "sudo usermod -aG docker
$USER" and then log out and back in or use "newgrp docker"); update the text in
the docs/v3-testing.md snippet that currently suggests chmod 666 so it instead
shows the safe alternative and removes the chmod line, and ensure the
surrounding explanation mentions the need to re-login or newgrp to apply group
changes.


</details>

<!-- fingerprinting:phantom:medusa:ocelot -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +31 to +36
# WARNING: Use strong passwords in production
# Generate secure password hash with: openssl rand -base64 32
username = "admin"
password = "CHANGE_THIS_STRONG_PASSWORD_NOW"
secrets_backend = "unsafe" # Backend for storing secrets: "pass", "sops", or "unsafe" (dev only!)
token_secret = "CHANGE_THIS_TOKEN_SECRET_NOW" # JWT signing secret (path or value)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Clarify whether password expects a hash or plaintext value.

Line 32 suggests generating a secure password hash with openssl rand -base64 32, but Line 34 shows what appears to be a plaintext password value. If the config expects a hashed password, the example should show a hash format. If it expects plaintext, the comment should recommend generating a random password rather than a hash.

🤖 Prompt for AI Agents
In `@gordon.toml.example` around lines 31 - 36, The password field is
ambiguous—clarify whether "password" expects a plaintext secret or a precomputed
hash: update the comment near the password line (and the openssl suggestion) to
explicitly state the expected format; if plaintext is expected, change the
openssl note to recommend generating a random password (e.g., "openssl rand
-base64 32") and state that the value should be the raw password, not a hash; if
a hashed password is required, state the hash algorithm and verification method
and show an example hash format instead. Ensure you reference the "password" key
and the existing openssl suggestion so the example and comment are consistent.

Comment on lines +139 to +167
// verifyGRPCConnectivityRestored verifies gRPC works after container restarts.
// Note: We check connectivity by verifying containers are running and ports are mapped.
// Full gRPC reconnection would require re-initializing clients with new host ports.
func (s *GordonTestSuite) verifyGRPCConnectivityRestored() {
// Refresh container references to get new container IDs
s.refreshAllContainerRefs()

// Verify all sub-containers are running
s.Require().NotNil(s.SecretsC, "secrets container should exist after restart")
s.Require().NotNil(s.RegistryC, "registry container should exist after restart")
s.Require().NotNil(s.ProxyC, "proxy container should exist after restart")

assert.Equal(s.T(), "running", s.SecretsC.State, "secrets should be running")
assert.Equal(s.T(), "running", s.RegistryC.State, "registry should be running")
assert.Equal(s.T(), "running", s.ProxyC.State, "proxy should be running")

// Verify gRPC health endpoints are accessible (core doesn't restart)
ctx, cancel := context.WithTimeout(s.ctx, 15*time.Second)
defer cancel()

coreHealth := grpc_health_v1.NewHealthClient(s.coreConn)
resp, err := coreHealth.Check(ctx, &grpc_health_v1.HealthCheckRequest{})
require.NoError(s.T(), err, "core health check failed after restarts")
assert.Equal(s.T(), grpc_health_v1.HealthCheckResponse_SERVING, resp.Status, "core should be serving")

s.T().Log("✓ Core service accessible after sub-container restarts")
s.T().Log("✓ All sub-containers are running after auto-restart")
s.T().Log("✓ gRPC connectivity verified via health checks")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Re-validate gRPC connectivity after restart.

The method currently checks only core health; secrets/registry ports can change after restart, so connectivity may be broken without being detected. Consider re-fetching mapped ports, re-dialing secrets/registry, and running health checks on those connections.

🤖 Prompt for AI Agents
In `@tests/integration/04_auto_restart_test.go` around lines 139 - 167, The test
only checks core health but doesn't re-establish connections to services whose
host ports may have changed; after calling refreshAllContainerRefs() in
verifyGRPCConnectivityRestored, look up the new mapped host:port from s.SecretsC
and s.RegistryC, close any existing s.secretsConn and s.registryConn, dial fresh
gRPC connections for secrets and registry (similar to how s.coreConn is
created), create health clients via grpc_health_v1.NewHealthClient(...) for
each, run Health.Check with a timeout and assert SERVING, and set the renewed
connections back on the suite (s.secretsConn, s.registryConn) so subsequent
tests use the updated connections.

Comment on lines +28 to +32
_, err = cli.ImageInspect(ctx, TestImageName)
if err == nil {
t.Logf("Using existing Gordon image: %s", TestImageName)
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type-check the error from ImageInspect.

Any error from ImageInspect is treated as "image not found", but other errors (network issues, permission problems) would also trigger a rebuild attempt. Consider checking specifically for errdefs.IsNotFound(err) to distinguish "not found" from other failures.

Proposed fix
+	"github.com/containerd/errdefs"
 )
 	_, err = cli.ImageInspect(ctx, TestImageName)
-	if err == nil {
+	if err == nil {
 		t.Logf("Using existing Gordon image: %s", TestImageName)
 		return nil
 	}
+	if !errdefs.IsNotFound(err) {
+		return fmt.Errorf("failed to inspect image: %w", err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_, err = cli.ImageInspect(ctx, TestImageName)
if err == nil {
t.Logf("Using existing Gordon image: %s", TestImageName)
return nil
}
_, err = cli.ImageInspect(ctx, TestImageName)
if err == nil {
t.Logf("Using existing Gordon image: %s", TestImageName)
return nil
}
if !errdefs.IsNotFound(err) {
return fmt.Errorf("failed to inspect image: %w", err)
}
🤖 Prompt for AI Agents
In `@tests/integration/helpers/docker.go` around lines 28 - 32, The current
ImageInspect call treats any error as "image not found" and always rebuilds;
change the error handling around the cli.ImageInspect(ctx, TestImageName) call
to distinguish not-found from other failures by using errdefs.IsNotFound(err)
(from Docker's errdefs package) — if IsNotFound(err) is true, proceed with the
rebuild path, otherwise return or propagate the actual error (or log/wrap it) so
network/permission errors are not misinterpreted as missing image; update the
block that currently checks "if err == nil { ... }" to explicitly handle err ==
nil, errdefs.IsNotFound(err), and the fallback error case.

Comment on lines +35 to +43
buildCmd := exec.Command("go", "build", "-o", "/tmp/gordon-test", "./main.go")
buildCmd.Dir = "../.."
if output, err := buildCmd.CombinedOutput(); err != nil {
return fmt.Errorf("failed to build binary: %w\n%s", err, output)
}

// Build Docker image
dockerCmd := exec.Command("docker", "build", "-t", TestImageName, ".")
dockerCmd.Dir = "../.."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Relative paths are fragile for test execution.

Using "../.." for buildCmd.Dir assumes tests run from a specific directory. Consider using runtime.Caller or an environment variable to determine the repository root reliably, ensuring tests work regardless of working directory.

🤖 Prompt for AI Agents
In `@tests/integration/helpers/docker.go` around lines 35 - 43, The tests use
hardcoded relative dirs ("../..") for buildCmd.Dir and dockerCmd.Dir which
breaks when tests run from other working directories; update the setup to
compute the repository root dynamically (e.g., via runtime.Caller to locate the
file path or an overriding env var) and set buildCmd.Dir and dockerCmd.Dir to
that absolute repo root so buildCmd, dockerCmd and the TestImageName docker
build always run from the correct project root.

Comment on lines +63 to +70
func GetDockerSocketPath() string {
host := DetectDockerHost()
// Remove unix:// prefix to get the host path
if len(host) > 7 && host[:7] == "unix://" {
return host[7:]
}
return host
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle non-Unix Docker hosts gracefully.

GetDockerSocketPath assumes a unix:// scheme. If DOCKER_HOST is set to a TCP endpoint (e.g., tcp://localhost:2375), the function returns the full URL as a "path", which would fail when used for socket mounting.

Proposed fix
+import "strings"

 func GetDockerSocketPath() string {
 	host := DetectDockerHost()
-	// Remove unix:// prefix to get the host path
-	if len(host) > 7 && host[:7] == "unix://" {
-		return host[7:]
+	if strings.HasPrefix(host, "unix://") {
+		return strings.TrimPrefix(host, "unix://")
 	}
-	return host
+	// Non-Unix schemes (tcp://, etc.) don't have a mountable socket path
+	return ""
 }
🤖 Prompt for AI Agents
In `@tests/integration/helpers/docker.go` around lines 63 - 70,
GetDockerSocketPath currently treats any DOCKER_HOST as a filesystem socket by
only stripping a "unix://" prefix; change it to detect scheme from
DetectDockerHost(): if the host begins with "unix://" return the path portion
(host[7:]), but if it begins with any other scheme (e.g., "tcp://") or contains
"://" return an empty string to indicate there is no Unix socket to mount;
ensure callers of GetDockerSocketPath handle the empty string case rather than
trying to mount a non-unix URL.

Comment on lines +43 to +49
| Test | File | Duration | Description | Status |
|------|------|----------|-------------|--------|
| Test01 | `01_startup_test.go` | ~4s | Four-container startup and health checks | ✅ Implemented |
| Test02 | `02_grpc_test.go` | ~30s | gRPC communication between components | 📝 Planned |
| Test03 | `03_registry_test.go` | ~3min | Image push triggers auto-deploy | 📝 Planned |
| Test04 | `04_restart_test.go` | ~2min | Auto-restart of failed sub-containers | 📝 Planned |
| Test05 | `05_security_test.go` | ~45s | Security isolation verification | 📝 Planned |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve markdownlint warnings (tables, fences, bare URL, blank lines).

  • Line 43 and Line 68 tables need spacing around pipes (MD060).
  • Line 55 fence should specify a language (MD040).
  • Line 84 uses a bare URL; wrap it as a markdown link (MD034).
  • Lines 95–111 need blank lines around headings/fences (MD022/MD031).

Also applies to: 55-85, 68-73, 95-115

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 44-44: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 44-44: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)

🤖 Prompt for AI Agents
In `@tests/integration/README.md` around lines 43 - 49, Fix the markdownlint
issues in the README by: adding a single space on both sides of each pipe within
the table rows/headers (the table block shown in the diff) to satisfy MD060;
specifying a language identifier (e.g., ```bash or ```sh) for the code fence
referenced around line 55 to satisfy MD040; replacing the bare URL with a proper
Markdown link [text](https://...) where the URL appears to satisfy MD034; and
ensuring there are blank lines before and after all headings and fenced code
blocks (the sections noted around lines 95–111 and 68–73) to address
MD022/MD031. Ensure these edits preserve content but fix spacing and fence
syntax so markdownlint errors are resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant