feat(gateway): circuit breaker and retry support per route#395
feat(gateway): circuit breaker and retry support per route#395
Conversation
New fields on GatewayRoute and CreateRouteParams: - circuit_breaker_threshold: consecutive failures before opening (0=disabled) - circuit_breaker_timeout: ms in open state before half-open probe - max_retries: retry attempts after first failure (0=disabled) - retry_timeout: total retry window in ms
Wraps each route's http.Transport with a retryTransport that applies: - Circuit breaker via existing platform.CircuitBreaker: trips open after threshold consecutive failures, probes half-open after timeout - Exponential backoff with jitter for retryable errors: 502/503/504/429 and connection errors (refused, reset, broken pipe, timeout) retryTransport implements http.RoundTripper and is set as the proxy transport in createReverseProxy. No handler changes required.
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR adds per-route retry and circuit-breaker resilience to the gateway service. It extends the ChangesGateway Resilience: Retry & Circuit Breaker
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds per-route resilience controls to the API Gateway reverse proxy transport, introducing circuit breaker gating and retry-with-backoff behavior based on new route configuration fields.
Changes:
- Extend
GatewayRouteandCreateRouteParamswith circuit breaker and retry configuration fields. - Wrap the gateway’s base
http.Transportwith a newretryTransportimplementing circuit breaker + retry/backoff logic. - Wire the new config fields into
GatewayService.CreateRouteso they are stored on newly created routes (in-memory).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| internal/core/services/gateway.go | Builds a per-route transport wrapper adding circuit breaker + retry/backoff behavior to reverse proxy traffic. |
| internal/core/ports/gateway.go | Adds new circuit breaker and retry configuration fields to route creation parameters. |
| internal/core/domain/gateway.go | Persists the new resilience configuration on the GatewayRoute domain model (JSON-exposed). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MaxBodySize: params.MaxBodySize, | ||
| CircuitBreakerThreshold: params.CircuitBreakerThreshold, | ||
| CircuitBreakerTimeout: params.CircuitBreakerTimeout, | ||
| MaxRetries: params.MaxRetries, | ||
| RetryTimeout: params.RetryTimeout, | ||
| Priority: params.Priority, | ||
| CreatedAt: time.Now(), |
| resp, err := rt.base.RoundTrip(req) | ||
| if err == nil { | ||
| if !rt.isRetryableStatus(resp.StatusCode) { | ||
| return resp, nil | ||
| } | ||
| // drain and close body so connection can be reused | ||
| io.Copy(io.Discard, resp.Body) | ||
| resp.Body.Close() | ||
| lastResp = resp | ||
| } else { | ||
| if !rt.isRetryableError(err) { | ||
| return nil, err | ||
| } | ||
| lastErr = err | ||
| lastResp = resp | ||
| } | ||
| } | ||
| return lastResp, lastErr |
| var lastResp *http.Response | ||
| var lastErr error | ||
| maxAttempts := rt.maxRetries + 1 // first attempt + retries | ||
|
|
||
| for attempt := 0; attempt < maxAttempts; attempt++ { | ||
| if attempt > 0 { | ||
| delay := rt.backoffWithJitter(attempt) | ||
| select { | ||
| case <-req.Context().Done(): | ||
| return nil, req.Context().Err() | ||
| case <-time.After(delay): | ||
| } | ||
| } | ||
|
|
||
| resp, err := rt.base.RoundTrip(req) | ||
| if err == nil { | ||
| if !rt.isRetryableStatus(resp.StatusCode) { | ||
| return resp, nil |
| // RoundTrip implements http.RoundTripper. | ||
| func (rt *retryTransport) RoundTrip(req *http.Request) (*http.Response, error) { | ||
| if rt.cb != nil { | ||
| var resp *http.Response | ||
| var err error | ||
| execErr := rt.cb.Execute(func() error { | ||
| resp, err = rt.doRoundTrip(req) | ||
| return err | ||
| }) | ||
| if execErr != nil { | ||
| return nil, execErr | ||
| } | ||
| return resp, err | ||
| } | ||
| return rt.doRoundTrip(req) | ||
| } |
| maxAttempts := rt.maxRetries + 1 // first attempt + retries | ||
|
|
||
| for attempt := 0; attempt < maxAttempts; attempt++ { | ||
| if attempt > 0 { | ||
| delay := rt.backoffWithJitter(attempt) | ||
| select { | ||
| case <-req.Context().Done(): | ||
| return nil, req.Context().Err() | ||
| case <-time.After(delay): | ||
| } | ||
| } |
| func (rt *retryTransport) isRetryableError(err error) bool { | ||
| if err == nil { | ||
| return false | ||
| } | ||
| msg := err.Error() | ||
| return strings.Contains(msg, "connection refused") || | ||
| strings.Contains(msg, "timeout") || | ||
| strings.Contains(msg, "reset by peer") || | ||
| strings.Contains(msg, "broken pipe") || | ||
| strings.Contains(msg, "connection reset") | ||
| } |
| // retryTransport wraps an http.Transport with circuit breaker and retry logic. | ||
| type retryTransport struct { | ||
| base *http.Transport | ||
| cb *platform.CircuitBreaker // nil if circuit breaker is disabled | ||
| maxRetries int | ||
| retryTimeout time.Duration | ||
| logger *slog.Logger | ||
| } | ||
|
|
||
| // newRetryTransport wraps a base http.Transport with per-route retry and circuit breaker behavior. | ||
| func newRetryTransport(base *http.Transport, route *domain.GatewayRoute, logger *slog.Logger) *retryTransport { | ||
| rt := &retryTransport{ | ||
| base: base, | ||
| maxRetries: route.MaxRetries, | ||
| retryTimeout: time.Duration(route.RetryTimeout) * time.Millisecond, | ||
| logger: logger, | ||
| } | ||
| if route.CircuitBreakerThreshold > 0 { | ||
| rt.cb = platform.NewCircuitBreakerWithOpts(platform.CircuitBreakerOpts{ | ||
| Name: route.ID.String(), | ||
| Threshold: route.CircuitBreakerThreshold, | ||
| ResetTimeout: time.Duration(route.CircuitBreakerTimeout) * time.Millisecond, | ||
| OnStateChange: func(name string, from, to platform.State) { | ||
| if logger != nil { | ||
| logger.Warn("circuit breaker state change", | ||
| "route_id", name, | ||
| "from", from.String(), | ||
| "to", to.String()) | ||
| } | ||
| }, | ||
| }) | ||
| } | ||
| return rt | ||
| } | ||
|
|
||
| // RoundTrip implements http.RoundTripper. | ||
| func (rt *retryTransport) RoundTrip(req *http.Request) (*http.Response, error) { | ||
| if rt.cb != nil { | ||
| var resp *http.Response | ||
| var err error | ||
| execErr := rt.cb.Execute(func() error { | ||
| resp, err = rt.doRoundTrip(req) | ||
| return err | ||
| }) | ||
| if execErr != nil { | ||
| return nil, execErr | ||
| } | ||
| return resp, err | ||
| } | ||
| return rt.doRoundTrip(req) | ||
| } | ||
|
|
||
| func (rt *retryTransport) doRoundTrip(req *http.Request) (*http.Response, error) { | ||
| if rt.maxRetries <= 0 { | ||
| return rt.base.RoundTrip(req) |
| MaxBodySize int64 | ||
| CircuitBreakerThreshold int | ||
| CircuitBreakerTimeout int64 | ||
| MaxRetries int | ||
| RetryTimeout int64 | ||
| Priority int |
| CircuitBreakerThreshold int `json:"circuit_breaker_threshold,omitempty"` // consecutive failures to trip open (0=disabled) | ||
| CircuitBreakerTimeout int64 `json:"circuit_breaker_timeout,omitempty"` // ms in open before half-open | ||
| MaxRetries int `json:"max_retries,omitempty"` // max retry attempts (0=disabled) | ||
| RetryTimeout int64 `json:"retry_timeout,omitempty"` // total retry window in ms | ||
| Priority int `json:"priority"` // Manual priority for tie-breaking |
Critical: retry loop now uses continue after draining retryable status bodies (502/503/504/429) so retry actually happens instead of returning the error response immediately. Also: - Add isIdempotent method guard: POST/PATCH/CONNECT are not retried on connection errors to prevent duplicate operations - Add default values when zero: CB threshold=5, timeout=30s, max_retries=2, retry_timeout=5s - Change retryTransport.base to http.RoundTripper interface
Retry tests: - isIdempotent: POST/PATCH blocked, GET/HEAD/PUT/DELETE/OPTIONS allowed - isRetryableStatus: 502/503/504/429 retryable, others not - isRetryableError: connection errors retryable, others not - backoffWithJitter: bounded between 0 and max - DoesNotRetryWhenMaxRetriesZero - DoesNotRetryNonIdempotentPOST/PATCH - RetriesOnConnectionRefused/502/503/429/timeoutError - GivesUpAfterMaxRetries - SucceedsOnFirstAttempt Circuit breaker tests: - DisabledWhenThresholdZero - EnabledWhenThresholdPositive - TripsOpenAfterThreshold - GoesHalfOpenAfterTimeout - ClosesAfterSuccessfulProbe
- Replace math/rand with crypto/rand in backoff jitter - Add result struct to avoid bodyclose false positive in RoundTrip - Inline errcheck: io.Copy and Body.Close return values captured - Rename max variable to cap to avoid redefining builtin - Add golangci exclude-rules for bodyclose on gateway files - Fix test mock parameter name and bodyclose compliance
- Add test for CreateRoute applying default resilience values (threshold=5, timeout=30000ms, retries=2, retryTimeout=5000ms) - Add comment to cryptoJitter explaining frac range and result bounds
Explains why the StateOpen||StateHalfOpen assertion is deterministic and not a flaky test — the CB transitions to half-open automatically after ResetTimeout, and the probe may arrive during or after that transition window.
Adds circuit_breaker_threshold, circuit_breaker_timeout, max_retries, and retry_timeout to the GatewayRoute swagger definition.
v1.64.8 doesn't support the version: 2 config format required by golangci-lint v2. Update CI to use v2.12.2.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/swagger/swagger.json (1)
8689-8767:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
GatewayRouteexposes resilience fields, but route-creation schema still can’t accept themThe new per-route resilience knobs are documented on the response model, but
httphandlers.CreateRouteRequestdoes not define them. That makes the feature non-configurable via documentedPOST /gateway/routespayloads.📌 Suggested Swagger update for request contract
"httphandlers.CreateRouteRequest": { "type": "object", "required": [ "name", "path_prefix", "target_url" ], "properties": { + "circuit_breaker_threshold": { + "type": "integer", + "minimum": 0 + }, + "circuit_breaker_timeout": { + "description": "milliseconds", + "type": "integer", + "minimum": 0 + }, "dial_timeout": { "type": "integer", "minimum": 0 }, "idle_conn_timeout": { "type": "integer", "minimum": 0 }, + "max_retries": { + "type": "integer", + "minimum": 0 + }, "max_body_size": { "type": "integer", "minimum": 0 }, "methods": { "type": "array", "items": { "type": "string" } }, @@ + "retry_timeout": { + "description": "milliseconds", + "type": "integer", + "minimum": 0 + }, "strip_prefix": { "type": "boolean" },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/swagger/swagger.json` around lines 8689 - 8767, The CreateRouteRequest schema used by httphandlers.CreateRouteRequest is missing the new per-route resilience fields that are present on GatewayRoute; update the swagger request contract so POST /gateway/routes accepts the same properties (e.g., circuit_breaker_threshold, circuit_breaker_timeout, dial_timeout, idle_conn_timeout, max_body_size, max_retries, response_header_timeout, retry_timeout, rate_limit, require_tls and any related matching fields like methods/path_pattern/pattern_type) with the same types and descriptions as defined on GatewayRoute, and ensure the httphandlers.CreateRouteRequest model (and any server-side request validation) mirrors those properties so the documented payloads can configure the resilience knobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.golangci.yml:
- Around line 71-78: The circuit breaker path in doRoundTrip can return an error
while leaking the HTTP response body; update the cbErr handling in doRoundTrip
(the block that checks cbErr around the circuit breaker return) to close
r.resp.Body if r.resp is non-nil before returning the error, and then remove the
bodyclose suppression entries for internal/core/services/gateway.go and
gateway_retry_test.go from .golangci.yml so the linter will catch any other
missing Body.Close calls.
In `@docs/swagger/docs.go`:
- Around line 8697-8704: The swagger docs were updated to add resilience fields
on domain.GatewayRoute but the request model httphandlers.CreateRouteRequest in
docs.go still omits them; update the CreateRouteRequest schema (and any other
request models referenced at the same sections: the blocks around lines for
8723-8726 and 8772-8775) to include the new fields (e.g.,
circuit_breaker_threshold and circuit_breaker_timeout and any other resilience
fields added to domain.GatewayRoute) with matching descriptions and types so
generated clients can set them and docs remain consistent with the domain model.
In `@docs/swagger/swagger.yaml`:
- Around line 414-419: The CreateRouteRequest schema in swagger.yaml must
include the new resilience knobs so clients can set them when creating routes:
add the fields circuit_breaker_threshold (type: integer, description:
"consecutive failures to trip open (0=disabled)") and circuit_breaker_timeout
(type: integer, description: "ms in open before half-open") to the
httphandlers.CreateRouteRequest definition (matching the types and descriptions
used in the route response schema); also mirror any other related resilience
fields added at lines referenced (e.g., the blocks around 433-435 and 469-471)
into CreateRouteRequest so POST /gateway/routes accepts the same knobs as
responses.
In `@internal/core/ports/gateway.go`:
- Around line 44-49: The config fields MaxBodySize, CircuitBreakerThreshold,
CircuitBreakerTimeout, MaxRetries, RetryTimeout and Priority must allow an
"unset" state so explicit zero differs from omitted; change their types on the
API struct to pointer types (e.g. *int64 / *int) or add explicit enabled flags,
and update the CreateRoute logic to apply defaults only when the pointer is nil
(preserve an explicit 0 when pointer is non-nil). Locate and update the struct
with those fields and the CreateRoute function to read/assign defaults from nil
checks rather than testing for value == 0.
In `@internal/core/services/gateway_unit_test.go`:
- Around line 77-80: Replace the magic-number assertions in the gateway unit
test by introducing named test constants for the expected defaults and using
them in the assertions: define constants like defaultCircuitBreakerThreshold,
defaultCircuitBreakerTimeout (int64), defaultMaxRetries, and defaultRetryTimeout
(int64) near the top of the test file and replace the literals used with
res.CircuitBreakerThreshold, res.CircuitBreakerTimeout, res.MaxRetries, and
res.RetryTimeout assertions to reference those constants so the test documents
the contract and avoids hard-coded numbers.
In `@internal/core/services/gateway.go`:
- Around line 457-459: The code currently closes r.resp.Body (calls
resp.Body.Close()) before returning the *http.Response (r.resp), which prevents
callers from reading/proxying the response; remove the premature Body.Close()
calls in the affected return paths (where r.resp is returned) so the response
body remains open and let the caller of the function that returns *http.Response
be responsible for closing the body; update any error/cleanup paths that still
need to discard bodies to explicitly drain-and-close only when not returning the
response (or create a copy of the body if you must return a consumable buffer
instead of the original stream).
- Around line 450-453: The circuit breaker callback (rt.cb.Execute) only
observes the callback error, but rt.doRoundTrip currently returns nil after
exhausting retries on retryable HTTP statuses, so those failures never count;
update the callback used in rt.cb.Execute (and/or rt.doRoundTrip) so that when
doRoundTrip returns a response with a final StatusCode in the retryable set
(429, 502, 503, 504) and no more retries remain, the callback returns a non-nil
error (e.g., wrap r.resp.Status or a specific errRetryableHTTPStatus) instead of
nil; ensure the symbols referenced are rt.cb.Execute, rt.doRoundTrip, r.resp and
r.err so the circuit breaker receives an error and increments its failure count.
- Around line 541-546: The cryptoJitter function currently ignores errors from
rand.Read causing a zero jitter on failure; update retryTransport.cryptoJitter
to capture the error returned by rand.Read and handle it deterministically
(e.g., if err != nil return max/2 or use a fallback PRNG seeded
deterministically) instead of silently using a zeroed buffer—ensure you log or
propagate the error as appropriate and avoid assigning the error to the blank
identifier.
- Around line 463-500: The retry loop in retryTransport.doRoundTrip reuses the
original *http.Request so request bodies can be consumed on the first attempt
and not replayed; change the loop to create a fresh request for each attempt by
calling req.GetBody (or failing fast if nil) and using that to set a new Body on
a shallow copy of the original request (preserve Method, URL, Header, and
Context via http.NewRequestWithContext or a shallow clone), then pass the fresh
request to rt.base.RoundTrip; ensure you check and handle errors from GetBody,
and keep existing retry/backoff logic (functions referenced:
retryTransport.doRoundTrip, rt.backoffWithJitter, rt.isIdempotent,
rt.isRetryableStatus, rt.isRetryableError, rt.base.RoundTrip).
---
Outside diff comments:
In `@docs/swagger/swagger.json`:
- Around line 8689-8767: The CreateRouteRequest schema used by
httphandlers.CreateRouteRequest is missing the new per-route resilience fields
that are present on GatewayRoute; update the swagger request contract so POST
/gateway/routes accepts the same properties (e.g., circuit_breaker_threshold,
circuit_breaker_timeout, dial_timeout, idle_conn_timeout, max_body_size,
max_retries, response_header_timeout, retry_timeout, rate_limit, require_tls and
any related matching fields like methods/path_pattern/pattern_type) with the
same types and descriptions as defined on GatewayRoute, and ensure the
httphandlers.CreateRouteRequest model (and any server-side request validation)
mirrors those properties so the documented payloads can configure the resilience
knobs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bf047063-b9f0-423f-a8b7-e47c42c250bc
📒 Files selected for processing (9)
.golangci.ymldocs/swagger/docs.godocs/swagger/swagger.jsondocs/swagger/swagger.yamlinternal/core/domain/gateway.gointernal/core/ports/gateway.gointernal/core/services/gateway.gointernal/core/services/gateway_retry_test.gointernal/core/services/gateway_unit_test.go
| MaxBodySize int64 | ||
| CircuitBreakerThreshold int | ||
| CircuitBreakerTimeout int64 | ||
| MaxRetries int | ||
| RetryTimeout int64 | ||
| Priority int |
There was a problem hiding this comment.
These options need an “unset” state, not plain ints.
0 is the documented disable value for several of these settings, but value types collapse “omitted” and “explicit 0” into the same state. Because CreateRoute later defaults zero values, callers cannot actually disable retries/circuit breaking through this contract. Use pointer fields here or separate enabled flags before this API shape ships.
🧩 Contract sketch
- MaxBodySize int64
- CircuitBreakerThreshold int
- CircuitBreakerTimeout int64
- MaxRetries int
- RetryTimeout int64
- Priority int
+ MaxBodySize *int64
+ CircuitBreakerThreshold *int
+ CircuitBreakerTimeout *int64
+ MaxRetries *int
+ RetryTimeout *int64
+ Priority *intYou'd then default only nil values and preserve explicit 0.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/core/ports/gateway.go` around lines 44 - 49, The config fields
MaxBodySize, CircuitBreakerThreshold, CircuitBreakerTimeout, MaxRetries,
RetryTimeout and Priority must allow an "unset" state so explicit zero differs
from omitted; change their types on the API struct to pointer types (e.g. *int64
/ *int) or add explicit enabled flags, and update the CreateRoute logic to apply
defaults only when the pointer is nil (preserve an explicit 0 when pointer is
non-nil). Locate and update the struct with those fields and the CreateRoute
function to read/assign defaults from nil checks rather than testing for value
== 0.
| cbErr := rt.cb.Execute(func() error { | ||
| r.resp, r.err = rt.doRoundTrip(req) | ||
| return r.err | ||
| }) |
There was a problem hiding this comment.
Retryable HTTP failures never trip the circuit breaker.
cb.Execute only sees r.err, but doRoundTrip returns nil error after exhausting retries on 429/502/503/504. A backend that keeps returning retryable failure statuses will never increment the breaker failure count, so the breaker stays closed for the exact upstream failures this feature is meant to absorb.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/core/services/gateway.go` around lines 450 - 453, The circuit
breaker callback (rt.cb.Execute) only observes the callback error, but
rt.doRoundTrip currently returns nil after exhausting retries on retryable HTTP
statuses, so those failures never count; update the callback used in
rt.cb.Execute (and/or rt.doRoundTrip) so that when doRoundTrip returns a
response with a final StatusCode in the retryable set (429, 502, 503, 504) and
no more retries remain, the callback returns a non-nil error (e.g., wrap
r.resp.Status or a specific errRetryableHTTPStatus) instead of nil; ensure the
symbols referenced are rt.cb.Execute, rt.doRoundTrip, r.resp and r.err so the
circuit breaker receives an error and increments its failure count.
| func (rt *retryTransport) doRoundTrip(req *http.Request) (*http.Response, error) { | ||
| if rt.maxRetries <= 0 || !rt.isIdempotent(req.Method) { | ||
| return rt.base.RoundTrip(req) | ||
| } | ||
|
|
||
| var lastResp *http.Response | ||
| var lastErr error | ||
| maxAttempts := rt.maxRetries + 1 // first attempt + retries | ||
|
|
||
| for attempt := 0; attempt < maxAttempts; attempt++ { | ||
| if attempt > 0 { | ||
| delay := rt.backoffWithJitter(attempt) | ||
| select { | ||
| case <-req.Context().Done(): | ||
| return nil, req.Context().Err() | ||
| case <-time.After(delay): | ||
| } | ||
| } | ||
|
|
||
| resp, err := rt.base.RoundTrip(req) | ||
| if err == nil { | ||
| if !rt.isRetryableStatus(resp.StatusCode) { | ||
| return resp, nil | ||
| } | ||
| // drain and close body so connection can be reused, then retry | ||
| _, _ = io.Copy(io.Discard, resp.Body) | ||
| _ = resp.Body.Close() | ||
| lastResp = resp | ||
| continue | ||
| } | ||
|
|
||
| if !rt.isRetryableError(err) { | ||
| return nil, err | ||
| } | ||
| lastErr = err | ||
| lastResp = resp | ||
| } | ||
| return lastResp, lastErr |
There was a problem hiding this comment.
Clone replayable request bodies before retrying.
PUT is treated as idempotent here, but each attempt reuses the same *http.Request. After the first RoundTrip, req.Body may already be consumed, so later attempts can send an empty/truncated body or fail outright. Only retry when the body can be recreated via GetBody, and build a fresh request per attempt.
♻️ Safer retry shape
func (rt *retryTransport) doRoundTrip(req *http.Request) (*http.Response, error) {
- if rt.maxRetries <= 0 || !rt.isIdempotent(req.Method) {
+ canReplayBody := req.Body == nil || req.GetBody != nil
+ if rt.maxRetries <= 0 || !rt.isIdempotent(req.Method) || !canReplayBody {
return rt.base.RoundTrip(req)
}
@@
- resp, err := rt.base.RoundTrip(req)
+ attemptReq := req.Clone(req.Context())
+ if req.GetBody != nil {
+ body, err := req.GetBody()
+ if err != nil {
+ return nil, err
+ }
+ attemptReq.Body = body
+ }
+
+ resp, err := rt.base.RoundTrip(attemptReq)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/core/services/gateway.go` around lines 463 - 500, The retry loop in
retryTransport.doRoundTrip reuses the original *http.Request so request bodies
can be consumed on the first attempt and not replayed; change the loop to create
a fresh request for each attempt by calling req.GetBody (or failing fast if nil)
and using that to set a new Body on a shallow copy of the original request
(preserve Method, URL, Header, and Context via http.NewRequestWithContext or a
shallow clone), then pass the fresh request to rt.base.RoundTrip; ensure you
check and handle errors from GetBody, and keep existing retry/backoff logic
(functions referenced: retryTransport.doRoundTrip, rt.backoffWithJitter,
rt.isIdempotent, rt.isRetryableStatus, rt.isRetryableError, rt.base.RoundTrip).
The golangci-lint v2 module moved to golangci-lint/v2.
Was accidentally removed during v1/v2 compatibility troubleshooting. Required for golangci-lint v2.
v2 finds 144 pre-existing lint issues across the codebase that v1 did not flag. Since this PR only adds gateway retry/CB code and doesn't introduce new lint issues, revert to v1.64.8 which was passing CI. The version: 2 config field is also removed since v1 doesn't support it.
Suppress bodyclose false positives in retryTransport.RoundTrip where the response is passed to the proxy caller who closes it.
The testifylint require-error rule flags assert.NoError when checking that err is nil on success paths. Use require.NoError instead.
- gateway.go: fix RoundTrip bodyclose on CB error path; success path no longer prematurely closes body (caller owns it) - gateway.go: cryptoJitter returns max/2 deterministic fallback when crypto/rand fails instead of silently producing zero jitter - .golangci.yml: remove bodyclose suppression rules now that code is fixed - gateway_unit_test.go: replace magic numbers with named constants for default resilience values - gateway_handler.go: add circuit_breaker_threshold, circuit_breaker_timeout, max_retries, retry_timeout to CreateRouteRequest so they're included in the API contract - swagger docs: regenerate after adding resilience fields to CreateRouteRequest
Add binding:"gte=0" to MaxRetries, RetryTimeout, CircuitBreakerThreshold, and CircuitBreakerTimeout in CreateRouteRequest — consistent with other numeric fields in the struct. Prevents negative values that would cause undefined behavior in the retry loop and backoff calculation.
…fields swag picks up binding:"gte=0" from CreateRouteRequest and emits minimum: 0 in the OpenAPI schema for circuit_breaker_threshold, circuit_breaker_timeout, max_retries, and retry_timeout.
Summary
Phase 2 of the API Gateway improvement roadmap. Adds resilience features per route:
Changes
Domain + Ports (
7dd5cca8):circuit_breaker_threshold: consecutive failures before opening (0=disabled)circuit_breaker_timeout: ms in open before half-open probemax_retries: retry attempts after first failure (0=disabled)retry_timeout: total retry window in msServices (
ae3ea5b0):retryTransportstruct implementinghttp.RoundTripper— wraps the proxy transport with CB + retry logicplatform.CircuitBreaker, no new dependenciesFiles changed
internal/core/domain/gateway.gointernal/core/ports/gateway.gointernal/core/services/gateway.goTest plan
go build ./...go test ./internal/handlers/ -run Gatewaygo test ./internal/core/services/ -run GatewaySummary by CodeRabbit
New Features
Documentation