Skip to content

feat(gateway): circuit breaker and retry support per route#395

Open
poyrazK wants to merge 17 commits intomainfrom
feat/gateway-circuit-breaker-and-retry
Open

feat(gateway): circuit breaker and retry support per route#395
poyrazK wants to merge 17 commits intomainfrom
feat/gateway-circuit-breaker-and-retry

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented May 5, 2026

Summary

Phase 2 of the API Gateway improvement roadmap. Adds resilience features per route:

  • Circuit Breaker: trips open after consecutive failures, probes half-open after configurable timeout
  • Retry with backoff: exponential backoff with full jitter for retryable status codes (502/503/504/429) and connection errors

Changes

Domain + Ports (7dd5cca8):

  • circuit_breaker_threshold: consecutive failures before opening (0=disabled)
  • circuit_breaker_timeout: ms in open before half-open probe
  • max_retries: retry attempts after first failure (0=disabled)
  • retry_timeout: total retry window in ms

Services (ae3ea5b0):

  • retryTransport struct implementing http.RoundTripper — wraps the proxy transport with CB + retry logic
  • CB uses existing platform.CircuitBreaker, no new dependencies
  • No handler changes required — retry logic lives below the Gin layer in the transport

Files changed

  • internal/core/domain/gateway.go
  • internal/core/ports/gateway.go
  • internal/core/services/gateway.go

Test plan

  • go build ./...
  • go test ./internal/handlers/ -run Gateway
  • go test ./internal/core/services/ -run Gateway

Summary by CodeRabbit

  • New Features

    • Gateway routes now support circuit breaker configuration options (threshold and timeout)
    • Gateway routes now support retry configuration options (max retries and retry timeout)
  • Documentation

    • Updated API schema documentation to reflect new gateway route configuration fields

poyrazK added 2 commits May 5, 2026 21:41
New fields on GatewayRoute and CreateRouteParams:
- circuit_breaker_threshold: consecutive failures before opening (0=disabled)
- circuit_breaker_timeout: ms in open state before half-open probe
- max_retries: retry attempts after first failure (0=disabled)
- retry_timeout: total retry window in ms
Wraps each route's http.Transport with a retryTransport that applies:
- Circuit breaker via existing platform.CircuitBreaker: trips open after
  threshold consecutive failures, probes half-open after timeout
- Exponential backoff with jitter for retryable errors: 502/503/504/429
  and connection errors (refused, reset, broken pipe, timeout)

retryTransport implements http.RoundTripper and is set as the proxy
transport in createReverseProxy. No handler changes required.
Copilot AI review requested due to automatic review settings May 5, 2026 18:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Review Change Stack

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 38 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf8b427f-584c-4d46-8b6c-87c2d1ee3dbb

📥 Commits

Reviewing files that changed from the base of the PR and between 8a36d44 and 2d1074c.

📒 Files selected for processing (7)
  • docs/swagger/docs.go
  • docs/swagger/swagger.json
  • docs/swagger/swagger.yaml
  • internal/core/services/gateway.go
  • internal/core/services/gateway_retry_test.go
  • internal/core/services/gateway_unit_test.go
  • internal/handlers/gateway_handler.go
📝 Walkthrough

Walkthrough

This PR adds per-route retry and circuit-breaker resilience to the gateway service. It extends the GatewayRoute domain type and CreateRouteParams contract with four new resilience-related fields, applies sensible defaults when creating routes, implements a retry transport wrapper for idempotent HTTP requests with exponential backoff, and provides comprehensive test coverage including circuit breaker state transitions.

Changes

Gateway Resilience: Retry & Circuit Breaker

Layer / File(s) Summary
Data Contracts & API Schema
internal/core/domain/gateway.go, internal/core/ports/gateway.go, docs/swagger/*
GatewayRoute gains CircuitBreakerThreshold, CircuitBreakerTimeout, MaxRetries, RetryTimeout; CreateRouteParams gains these plus MaxBodySize and Priority; all OpenAPI docs updated with new field definitions.
Resilience Configuration & Implementation
internal/core/services/gateway.go
CreateRoute applies default resilience values when parameters are zero; new retryTransport type implements http.RoundTripper with retry logic for idempotent methods, specific HTTP status codes (429, 502–504), error patterns, response body draining, and exponential backoff with cryptographic jitter.
Proxy Transport Integration
internal/core/services/gateway.go
createReverseProxy wraps the base http.Transport with newRetryTransport to enable per-route retry and circuit breaker behavior on outbound proxy requests.
Testing & Verification
internal/core/services/gateway_retry_test.go, internal/core/services/gateway_unit_test.go
Tests validate retry policy (method idempotency, retryable conditions, backoff bounds), retry loop scenarios (max retries, non-idempotent method rejection), circuit breaker initialization and state transitions, and default value population in CreateRoute.
Linting Configuration
.golangci.yml
golangci-lint bodyclose rule excluded for gateway.go and gateway_retry_test.go to suppress false positives where response bodies are intentionally drained for retry enablement.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • poyrazK/thecloud#378: Both PRs extend the gateway route configuration contract (internal/core/domain/gateway.go and internal/core/ports/gateway.go) with new fields; this PR adds circuit-breaker and retry controls while the related PR adds timeouts, TLS, CIDR, and body-size constraints.
  • poyrazK/thecloud#154: This PR builds on platform-level circuit-breaker primitives (platform.NewCircuitBreakerWithOpts) to add per-route resilience to the gateway, complementing the foundation established in the related HA/resilience infrastructure PR.

Poem

🐰 A rabbit's verse on retry flows:

When circuits trip and backends break,
Our retry dance does gently make
Idempotent paths try, try again—
With jittered backoff, patience wins.
The gateway stands both strong and wise,
With circuit breakers for the skies! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main changes: adding circuit breaker and retry support on a per-route basis for the gateway, which aligns with the core implementation across domain, ports, and service files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gateway-circuit-breaker-and-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-route resilience controls to the API Gateway reverse proxy transport, introducing circuit breaker gating and retry-with-backoff behavior based on new route configuration fields.

Changes:

  • Extend GatewayRoute and CreateRouteParams with circuit breaker and retry configuration fields.
  • Wrap the gateway’s base http.Transport with a new retryTransport implementing circuit breaker + retry/backoff logic.
  • Wire the new config fields into GatewayService.CreateRoute so they are stored on newly created routes (in-memory).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
internal/core/services/gateway.go Builds a per-route transport wrapper adding circuit breaker + retry/backoff behavior to reverse proxy traffic.
internal/core/ports/gateway.go Adds new circuit breaker and retry configuration fields to route creation parameters.
internal/core/domain/gateway.go Persists the new resilience configuration on the GatewayRoute domain model (JSON-exposed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 100 to 106
MaxBodySize: params.MaxBodySize,
CircuitBreakerThreshold: params.CircuitBreakerThreshold,
CircuitBreakerTimeout: params.CircuitBreakerTimeout,
MaxRetries: params.MaxRetries,
RetryTimeout: params.RetryTimeout,
Priority: params.Priority,
CreatedAt: time.Now(),
Comment on lines +460 to +477
resp, err := rt.base.RoundTrip(req)
if err == nil {
if !rt.isRetryableStatus(resp.StatusCode) {
return resp, nil
}
// drain and close body so connection can be reused
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
lastResp = resp
} else {
if !rt.isRetryableError(err) {
return nil, err
}
lastErr = err
lastResp = resp
}
}
return lastResp, lastErr
Comment on lines +446 to +463
var lastResp *http.Response
var lastErr error
maxAttempts := rt.maxRetries + 1 // first attempt + retries

for attempt := 0; attempt < maxAttempts; attempt++ {
if attempt > 0 {
delay := rt.backoffWithJitter(attempt)
select {
case <-req.Context().Done():
return nil, req.Context().Err()
case <-time.After(delay):
}
}

resp, err := rt.base.RoundTrip(req)
if err == nil {
if !rt.isRetryableStatus(resp.StatusCode) {
return resp, nil
Comment on lines +424 to +439
// RoundTrip implements http.RoundTripper.
func (rt *retryTransport) RoundTrip(req *http.Request) (*http.Response, error) {
if rt.cb != nil {
var resp *http.Response
var err error
execErr := rt.cb.Execute(func() error {
resp, err = rt.doRoundTrip(req)
return err
})
if execErr != nil {
return nil, execErr
}
return resp, err
}
return rt.doRoundTrip(req)
}
Comment on lines +448 to +458
maxAttempts := rt.maxRetries + 1 // first attempt + retries

for attempt := 0; attempt < maxAttempts; attempt++ {
if attempt > 0 {
delay := rt.backoffWithJitter(attempt)
select {
case <-req.Context().Done():
return nil, req.Context().Err()
case <-time.After(delay):
}
}
Comment on lines +484 to +494
func (rt *retryTransport) isRetryableError(err error) bool {
if err == nil {
return false
}
msg := err.Error()
return strings.Contains(msg, "connection refused") ||
strings.Contains(msg, "timeout") ||
strings.Contains(msg, "reset by peer") ||
strings.Contains(msg, "broken pipe") ||
strings.Contains(msg, "connection reset")
}
Comment on lines +389 to +443
// retryTransport wraps an http.Transport with circuit breaker and retry logic.
type retryTransport struct {
base *http.Transport
cb *platform.CircuitBreaker // nil if circuit breaker is disabled
maxRetries int
retryTimeout time.Duration
logger *slog.Logger
}

// newRetryTransport wraps a base http.Transport with per-route retry and circuit breaker behavior.
func newRetryTransport(base *http.Transport, route *domain.GatewayRoute, logger *slog.Logger) *retryTransport {
rt := &retryTransport{
base: base,
maxRetries: route.MaxRetries,
retryTimeout: time.Duration(route.RetryTimeout) * time.Millisecond,
logger: logger,
}
if route.CircuitBreakerThreshold > 0 {
rt.cb = platform.NewCircuitBreakerWithOpts(platform.CircuitBreakerOpts{
Name: route.ID.String(),
Threshold: route.CircuitBreakerThreshold,
ResetTimeout: time.Duration(route.CircuitBreakerTimeout) * time.Millisecond,
OnStateChange: func(name string, from, to platform.State) {
if logger != nil {
logger.Warn("circuit breaker state change",
"route_id", name,
"from", from.String(),
"to", to.String())
}
},
})
}
return rt
}

// RoundTrip implements http.RoundTripper.
func (rt *retryTransport) RoundTrip(req *http.Request) (*http.Response, error) {
if rt.cb != nil {
var resp *http.Response
var err error
execErr := rt.cb.Execute(func() error {
resp, err = rt.doRoundTrip(req)
return err
})
if execErr != nil {
return nil, execErr
}
return resp, err
}
return rt.doRoundTrip(req)
}

func (rt *retryTransport) doRoundTrip(req *http.Request) (*http.Response, error) {
if rt.maxRetries <= 0 {
return rt.base.RoundTrip(req)
Comment on lines +44 to +49
MaxBodySize int64
CircuitBreakerThreshold int
CircuitBreakerTimeout int64
MaxRetries int
RetryTimeout int64
Priority int
Comment on lines +35 to 39
CircuitBreakerThreshold int `json:"circuit_breaker_threshold,omitempty"` // consecutive failures to trip open (0=disabled)
CircuitBreakerTimeout int64 `json:"circuit_breaker_timeout,omitempty"` // ms in open before half-open
MaxRetries int `json:"max_retries,omitempty"` // max retry attempts (0=disabled)
RetryTimeout int64 `json:"retry_timeout,omitempty"` // total retry window in ms
Priority int `json:"priority"` // Manual priority for tie-breaking
poyrazK added 3 commits May 6, 2026 19:04
Critical: retry loop now uses continue after draining retryable status
bodies (502/503/504/429) so retry actually happens instead of returning
the error response immediately.

Also:
- Add isIdempotent method guard: POST/PATCH/CONNECT are not retried
  on connection errors to prevent duplicate operations
- Add default values when zero: CB threshold=5, timeout=30s,
  max_retries=2, retry_timeout=5s
- Change retryTransport.base to http.RoundTripper interface
Retry tests:
- isIdempotent: POST/PATCH blocked, GET/HEAD/PUT/DELETE/OPTIONS allowed
- isRetryableStatus: 502/503/504/429 retryable, others not
- isRetryableError: connection errors retryable, others not
- backoffWithJitter: bounded between 0 and max
- DoesNotRetryWhenMaxRetriesZero
- DoesNotRetryNonIdempotentPOST/PATCH
- RetriesOnConnectionRefused/502/503/429/timeoutError
- GivesUpAfterMaxRetries
- SucceedsOnFirstAttempt

Circuit breaker tests:
- DisabledWhenThresholdZero
- EnabledWhenThresholdPositive
- TripsOpenAfterThreshold
- GoesHalfOpenAfterTimeout
- ClosesAfterSuccessfulProbe
- Replace math/rand with crypto/rand in backoff jitter
- Add result struct to avoid bodyclose false positive in RoundTrip
- Inline errcheck: io.Copy and Body.Close return values captured
- Rename max variable to cap to avoid redefining builtin
- Add golangci exclude-rules for bodyclose on gateway files
- Fix test mock parameter name and bodyclose compliance
Copilot AI review requested due to automatic review settings May 7, 2026 13:40
- Add test for CreateRoute applying default resilience values
  (threshold=5, timeout=30000ms, retries=2, retryTimeout=5000ms)
- Add comment to cryptoJitter explaining frac range and result bounds
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Explains why the StateOpen||StateHalfOpen assertion is deterministic
and not a flaky test — the CB transitions to half-open automatically
after ResetTimeout, and the probe may arrive during or after that
transition window.
Copilot AI review requested due to automatic review settings May 7, 2026 14:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added 2 commits May 7, 2026 18:30
Adds circuit_breaker_threshold, circuit_breaker_timeout, max_retries,
and retry_timeout to the GatewayRoute swagger definition.
v1.64.8 doesn't support the version: 2 config format required
by golangci-lint v2. Update CI to use v2.12.2.
Copilot AI review requested due to automatic review settings May 7, 2026 15:42
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/swagger/swagger.json (1)

8689-8767: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

GatewayRoute exposes resilience fields, but route-creation schema still can’t accept them

The new per-route resilience knobs are documented on the response model, but httphandlers.CreateRouteRequest does not define them. That makes the feature non-configurable via documented POST /gateway/routes payloads.

📌 Suggested Swagger update for request contract
 "httphandlers.CreateRouteRequest": {
   "type": "object",
   "required": [
     "name",
     "path_prefix",
     "target_url"
   ],
   "properties": {
+    "circuit_breaker_threshold": {
+      "type": "integer",
+      "minimum": 0
+    },
+    "circuit_breaker_timeout": {
+      "description": "milliseconds",
+      "type": "integer",
+      "minimum": 0
+    },
     "dial_timeout": {
       "type": "integer",
       "minimum": 0
     },
     "idle_conn_timeout": {
       "type": "integer",
       "minimum": 0
     },
+    "max_retries": {
+      "type": "integer",
+      "minimum": 0
+    },
     "max_body_size": {
       "type": "integer",
       "minimum": 0
     },
     "methods": {
       "type": "array",
       "items": {
         "type": "string"
       }
     },
@@
+    "retry_timeout": {
+      "description": "milliseconds",
+      "type": "integer",
+      "minimum": 0
+    },
     "strip_prefix": {
       "type": "boolean"
     },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/swagger/swagger.json` around lines 8689 - 8767, The CreateRouteRequest
schema used by httphandlers.CreateRouteRequest is missing the new per-route
resilience fields that are present on GatewayRoute; update the swagger request
contract so POST /gateway/routes accepts the same properties (e.g.,
circuit_breaker_threshold, circuit_breaker_timeout, dial_timeout,
idle_conn_timeout, max_body_size, max_retries, response_header_timeout,
retry_timeout, rate_limit, require_tls and any related matching fields like
methods/path_pattern/pattern_type) with the same types and descriptions as
defined on GatewayRoute, and ensure the httphandlers.CreateRouteRequest model
(and any server-side request validation) mirrors those properties so the
documented payloads can configure the resilience knobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.golangci.yml:
- Around line 71-78: The circuit breaker path in doRoundTrip can return an error
while leaking the HTTP response body; update the cbErr handling in doRoundTrip
(the block that checks cbErr around the circuit breaker return) to close
r.resp.Body if r.resp is non-nil before returning the error, and then remove the
bodyclose suppression entries for internal/core/services/gateway.go and
gateway_retry_test.go from .golangci.yml so the linter will catch any other
missing Body.Close calls.

In `@docs/swagger/docs.go`:
- Around line 8697-8704: The swagger docs were updated to add resilience fields
on domain.GatewayRoute but the request model httphandlers.CreateRouteRequest in
docs.go still omits them; update the CreateRouteRequest schema (and any other
request models referenced at the same sections: the blocks around lines for
8723-8726 and 8772-8775) to include the new fields (e.g.,
circuit_breaker_threshold and circuit_breaker_timeout and any other resilience
fields added to domain.GatewayRoute) with matching descriptions and types so
generated clients can set them and docs remain consistent with the domain model.

In `@docs/swagger/swagger.yaml`:
- Around line 414-419: The CreateRouteRequest schema in swagger.yaml must
include the new resilience knobs so clients can set them when creating routes:
add the fields circuit_breaker_threshold (type: integer, description:
"consecutive failures to trip open (0=disabled)") and circuit_breaker_timeout
(type: integer, description: "ms in open before half-open") to the
httphandlers.CreateRouteRequest definition (matching the types and descriptions
used in the route response schema); also mirror any other related resilience
fields added at lines referenced (e.g., the blocks around 433-435 and 469-471)
into CreateRouteRequest so POST /gateway/routes accepts the same knobs as
responses.

In `@internal/core/ports/gateway.go`:
- Around line 44-49: The config fields MaxBodySize, CircuitBreakerThreshold,
CircuitBreakerTimeout, MaxRetries, RetryTimeout and Priority must allow an
"unset" state so explicit zero differs from omitted; change their types on the
API struct to pointer types (e.g. *int64 / *int) or add explicit enabled flags,
and update the CreateRoute logic to apply defaults only when the pointer is nil
(preserve an explicit 0 when pointer is non-nil). Locate and update the struct
with those fields and the CreateRoute function to read/assign defaults from nil
checks rather than testing for value == 0.

In `@internal/core/services/gateway_unit_test.go`:
- Around line 77-80: Replace the magic-number assertions in the gateway unit
test by introducing named test constants for the expected defaults and using
them in the assertions: define constants like defaultCircuitBreakerThreshold,
defaultCircuitBreakerTimeout (int64), defaultMaxRetries, and defaultRetryTimeout
(int64) near the top of the test file and replace the literals used with
res.CircuitBreakerThreshold, res.CircuitBreakerTimeout, res.MaxRetries, and
res.RetryTimeout assertions to reference those constants so the test documents
the contract and avoids hard-coded numbers.

In `@internal/core/services/gateway.go`:
- Around line 457-459: The code currently closes r.resp.Body (calls
resp.Body.Close()) before returning the *http.Response (r.resp), which prevents
callers from reading/proxying the response; remove the premature Body.Close()
calls in the affected return paths (where r.resp is returned) so the response
body remains open and let the caller of the function that returns *http.Response
be responsible for closing the body; update any error/cleanup paths that still
need to discard bodies to explicitly drain-and-close only when not returning the
response (or create a copy of the body if you must return a consumable buffer
instead of the original stream).
- Around line 450-453: The circuit breaker callback (rt.cb.Execute) only
observes the callback error, but rt.doRoundTrip currently returns nil after
exhausting retries on retryable HTTP statuses, so those failures never count;
update the callback used in rt.cb.Execute (and/or rt.doRoundTrip) so that when
doRoundTrip returns a response with a final StatusCode in the retryable set
(429, 502, 503, 504) and no more retries remain, the callback returns a non-nil
error (e.g., wrap r.resp.Status or a specific errRetryableHTTPStatus) instead of
nil; ensure the symbols referenced are rt.cb.Execute, rt.doRoundTrip, r.resp and
r.err so the circuit breaker receives an error and increments its failure count.
- Around line 541-546: The cryptoJitter function currently ignores errors from
rand.Read causing a zero jitter on failure; update retryTransport.cryptoJitter
to capture the error returned by rand.Read and handle it deterministically
(e.g., if err != nil return max/2 or use a fallback PRNG seeded
deterministically) instead of silently using a zeroed buffer—ensure you log or
propagate the error as appropriate and avoid assigning the error to the blank
identifier.
- Around line 463-500: The retry loop in retryTransport.doRoundTrip reuses the
original *http.Request so request bodies can be consumed on the first attempt
and not replayed; change the loop to create a fresh request for each attempt by
calling req.GetBody (or failing fast if nil) and using that to set a new Body on
a shallow copy of the original request (preserve Method, URL, Header, and
Context via http.NewRequestWithContext or a shallow clone), then pass the fresh
request to rt.base.RoundTrip; ensure you check and handle errors from GetBody,
and keep existing retry/backoff logic (functions referenced:
retryTransport.doRoundTrip, rt.backoffWithJitter, rt.isIdempotent,
rt.isRetryableStatus, rt.isRetryableError, rt.base.RoundTrip).

---

Outside diff comments:
In `@docs/swagger/swagger.json`:
- Around line 8689-8767: The CreateRouteRequest schema used by
httphandlers.CreateRouteRequest is missing the new per-route resilience fields
that are present on GatewayRoute; update the swagger request contract so POST
/gateway/routes accepts the same properties (e.g., circuit_breaker_threshold,
circuit_breaker_timeout, dial_timeout, idle_conn_timeout, max_body_size,
max_retries, response_header_timeout, retry_timeout, rate_limit, require_tls and
any related matching fields like methods/path_pattern/pattern_type) with the
same types and descriptions as defined on GatewayRoute, and ensure the
httphandlers.CreateRouteRequest model (and any server-side request validation)
mirrors those properties so the documented payloads can configure the resilience
knobs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf047063-b9f0-423f-a8b7-e47c42c250bc

📥 Commits

Reviewing files that changed from the base of the PR and between 54851aa and 8a36d44.

📒 Files selected for processing (9)
  • .golangci.yml
  • docs/swagger/docs.go
  • docs/swagger/swagger.json
  • docs/swagger/swagger.yaml
  • internal/core/domain/gateway.go
  • internal/core/ports/gateway.go
  • internal/core/services/gateway.go
  • internal/core/services/gateway_retry_test.go
  • internal/core/services/gateway_unit_test.go

Comment thread .golangci.yml Outdated
Comment thread docs/swagger/docs.go
Comment thread docs/swagger/swagger.yaml
Comment on lines +44 to +49
MaxBodySize int64
CircuitBreakerThreshold int
CircuitBreakerTimeout int64
MaxRetries int
RetryTimeout int64
Priority int
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

These options need an “unset” state, not plain ints.

0 is the documented disable value for several of these settings, but value types collapse “omitted” and “explicit 0” into the same state. Because CreateRoute later defaults zero values, callers cannot actually disable retries/circuit breaking through this contract. Use pointer fields here or separate enabled flags before this API shape ships.

🧩 Contract sketch
-	MaxBodySize               int64
-	CircuitBreakerThreshold   int
-	CircuitBreakerTimeout     int64
-	MaxRetries                int
-	RetryTimeout              int64
-	Priority                  int
+	MaxBodySize               *int64
+	CircuitBreakerThreshold   *int
+	CircuitBreakerTimeout     *int64
+	MaxRetries                *int
+	RetryTimeout              *int64
+	Priority                  *int

You'd then default only nil values and preserve explicit 0.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/ports/gateway.go` around lines 44 - 49, The config fields
MaxBodySize, CircuitBreakerThreshold, CircuitBreakerTimeout, MaxRetries,
RetryTimeout and Priority must allow an "unset" state so explicit zero differs
from omitted; change their types on the API struct to pointer types (e.g. *int64
/ *int) or add explicit enabled flags, and update the CreateRoute logic to apply
defaults only when the pointer is nil (preserve an explicit 0 when pointer is
non-nil). Locate and update the struct with those fields and the CreateRoute
function to read/assign defaults from nil checks rather than testing for value
== 0.

Comment thread internal/core/services/gateway_unit_test.go Outdated
Comment on lines +450 to +453
cbErr := rt.cb.Execute(func() error {
r.resp, r.err = rt.doRoundTrip(req)
return r.err
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Retryable HTTP failures never trip the circuit breaker.

cb.Execute only sees r.err, but doRoundTrip returns nil error after exhausting retries on 429/502/503/504. A backend that keeps returning retryable failure statuses will never increment the breaker failure count, so the breaker stays closed for the exact upstream failures this feature is meant to absorb.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/services/gateway.go` around lines 450 - 453, The circuit
breaker callback (rt.cb.Execute) only observes the callback error, but
rt.doRoundTrip currently returns nil after exhausting retries on retryable HTTP
statuses, so those failures never count; update the callback used in
rt.cb.Execute (and/or rt.doRoundTrip) so that when doRoundTrip returns a
response with a final StatusCode in the retryable set (429, 502, 503, 504) and
no more retries remain, the callback returns a non-nil error (e.g., wrap
r.resp.Status or a specific errRetryableHTTPStatus) instead of nil; ensure the
symbols referenced are rt.cb.Execute, rt.doRoundTrip, r.resp and r.err so the
circuit breaker receives an error and increments its failure count.

Comment thread internal/core/services/gateway.go Outdated
Comment on lines +463 to +500
func (rt *retryTransport) doRoundTrip(req *http.Request) (*http.Response, error) {
if rt.maxRetries <= 0 || !rt.isIdempotent(req.Method) {
return rt.base.RoundTrip(req)
}

var lastResp *http.Response
var lastErr error
maxAttempts := rt.maxRetries + 1 // first attempt + retries

for attempt := 0; attempt < maxAttempts; attempt++ {
if attempt > 0 {
delay := rt.backoffWithJitter(attempt)
select {
case <-req.Context().Done():
return nil, req.Context().Err()
case <-time.After(delay):
}
}

resp, err := rt.base.RoundTrip(req)
if err == nil {
if !rt.isRetryableStatus(resp.StatusCode) {
return resp, nil
}
// drain and close body so connection can be reused, then retry
_, _ = io.Copy(io.Discard, resp.Body)
_ = resp.Body.Close()
lastResp = resp
continue
}

if !rt.isRetryableError(err) {
return nil, err
}
lastErr = err
lastResp = resp
}
return lastResp, lastErr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clone replayable request bodies before retrying.

PUT is treated as idempotent here, but each attempt reuses the same *http.Request. After the first RoundTrip, req.Body may already be consumed, so later attempts can send an empty/truncated body or fail outright. Only retry when the body can be recreated via GetBody, and build a fresh request per attempt.

♻️ Safer retry shape
 func (rt *retryTransport) doRoundTrip(req *http.Request) (*http.Response, error) {
-	if rt.maxRetries <= 0 || !rt.isIdempotent(req.Method) {
+	canReplayBody := req.Body == nil || req.GetBody != nil
+	if rt.maxRetries <= 0 || !rt.isIdempotent(req.Method) || !canReplayBody {
 		return rt.base.RoundTrip(req)
 	}
@@
-		resp, err := rt.base.RoundTrip(req)
+		attemptReq := req.Clone(req.Context())
+		if req.GetBody != nil {
+			body, err := req.GetBody()
+			if err != nil {
+				return nil, err
+			}
+			attemptReq.Body = body
+		}
+
+		resp, err := rt.base.RoundTrip(attemptReq)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/core/services/gateway.go` around lines 463 - 500, The retry loop in
retryTransport.doRoundTrip reuses the original *http.Request so request bodies
can be consumed on the first attempt and not replayed; change the loop to create
a fresh request for each attempt by calling req.GetBody (or failing fast if nil)
and using that to set a new Body on a shallow copy of the original request
(preserve Method, URL, Header, and Context via http.NewRequestWithContext or a
shallow clone), then pass the fresh request to rt.base.RoundTrip; ensure you
check and handle errors from GetBody, and keep existing retry/backoff logic
(functions referenced: retryTransport.doRoundTrip, rt.backoffWithJitter,
rt.isIdempotent, rt.isRetryableStatus, rt.isRetryableError, rt.base.RoundTrip).

Comment thread internal/core/services/gateway.go
poyrazK added 2 commits May 7, 2026 18:50
The golangci-lint v2 module moved to golangci-lint/v2.
Was accidentally removed during v1/v2 compatibility troubleshooting.
Required for golangci-lint v2.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added 2 commits May 7, 2026 19:06
v2 finds 144 pre-existing lint issues across the codebase that v1
did not flag. Since this PR only adds gateway retry/CB code and
doesn't introduce new lint issues, revert to v1.64.8 which was
passing CI.

The version: 2 config field is also removed since v1 doesn't
support it.
Suppress bodyclose false positives in retryTransport.RoundTrip
where the response is passed to the proxy caller who closes it.
Copilot AI review requested due to automatic review settings May 7, 2026 16:14
The testifylint require-error rule flags assert.NoError when checking
that err is nil on success paths. Use require.NoError instead.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

poyrazK added 2 commits May 7, 2026 20:26
- gateway.go: fix RoundTrip bodyclose on CB error path; success path
  no longer prematurely closes body (caller owns it)
- gateway.go: cryptoJitter returns max/2 deterministic fallback when
  crypto/rand fails instead of silently producing zero jitter
- .golangci.yml: remove bodyclose suppression rules now that code is fixed
- gateway_unit_test.go: replace magic numbers with named constants for
  default resilience values
- gateway_handler.go: add circuit_breaker_threshold,
  circuit_breaker_timeout, max_retries, retry_timeout to CreateRouteRequest
  so they're included in the API contract
- swagger docs: regenerate after adding resilience fields to CreateRouteRequest
Add binding:"gte=0" to MaxRetries, RetryTimeout, CircuitBreakerThreshold,
and CircuitBreakerTimeout in CreateRouteRequest — consistent with other
numeric fields in the struct. Prevents negative values that would cause
undefined behavior in the retry loop and backoff calculation.
Copilot AI review requested due to automatic review settings May 7, 2026 18:02
…fields

swag picks up binding:"gte=0" from CreateRouteRequest and emits
minimum: 0 in the OpenAPI schema for circuit_breaker_threshold,
circuit_breaker_timeout, max_retries, and retry_timeout.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants