-
Notifications
You must be signed in to change notification settings - Fork 25
feat(gateway): add graceful shutdown with signal handling #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(gateway): add graceful shutdown with signal handling #41
Conversation
📝 WalkthroughWalkthroughAdds graceful shutdown to the gateway: installs signal handling (SIGINT/SIGTERM), replaces Gin's Changes
Sequence Diagram(s)sequenceDiagram
participant OS
participant Main
participant SignalHandler as Signal Handler
participant HTTPServer as HTTP Server
participant ReqTracker as Request Tracker
participant Client
Client->>HTTPServer: Start requests
HTTPServer->>ReqTracker: TrackInFlightRequests (increment)
Note over Client,HTTPServer: Handler executes (may be slow)
OS->>SignalHandler: SIGINT/SIGTERM
SignalHandler->>Main: deliver signal
Main->>Main: log shutdown initiated
Main->>ReqTracker: GetActiveRequestCount()
alt active > 0
Main->>ReqTracker: WaitForInFlightRequests()
ReqTracker-->>Main: blocks until WaitGroup done
Client->>HTTPServer: completes request
HTTPServer->>ReqTracker: TrackInFlightRequests (decrement, Done)
end
Main->>HTTPServer: srv.Shutdown(ctx 30s)
HTTPServer-->>Main: Shutdown complete
Main->>Main: log shutdown finished
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Suggested Labels
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (2)
-
gateway/main.go, line 147-151 (link)logic: rate limiter cleanup goroutines not stopped on shutdown - causes goroutine leak
When rate limiting is enabled,
NewTokenBucketstarts a background cleanup goroutine for each limiter. These goroutines are never stopped during graceful shutdown, leading to goroutine leaks.Store the limiters in a variable accessible in main, then call
limiter.Stop()for each beforesrv.Shutdown()completes. -
gateway/main.go, line 172 (link)syntax: unnecessary leading whitespace before
addrvariable declarationNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
1 file reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Fix all issues with AI Agents 🤖
In @gateway/main.go:
- Line 172: Fix the inconsistent leading whitespace on the line defining addr
(addr := ":" + port) in main (gateway main.go); reformat the file with gofmt (or
run goimports/golangci-lint autofix) so the indentation matches the rest of the
function and Go formatting rules.
- Around line 20-21: The import block containing "os/signal" and "syscall" in
gateway/main.go uses space indentation instead of tabs; run gofmt (or go fmt) to
reformat the file so the import statements and surrounding code use Go's
standard tab indentation, or manually replace the leading spaces with tabs in
the import block so the "os/signal" and "syscall" lines align with the rest of
the imports.
- Around line 179-184: The goroutine currently calls log.Fatalf on
srv.ListenAndServe errors which exits immediately and skips the graceful
shutdown logic around the main shutdown handling; instead remove log.Fatalf and
propagate the error to the main goroutine via an error channel (e.g., create
errCh before launching the goroutine), have the goroutine send any non-nil,
non-http.ErrServerClosed error into errCh after logging, and let the main
select/receive on errCh alongside the existing shutdown signals so the main
goroutine performs the cleanup and graceful shutdown when ListenAndServe fails
on srv (address `addr`) rather than calling os.Exit inside the goroutine.
- Around line 186-201: The shutdown grace period is hardcoded to 30s and can
race with AI request deadlines; change the context timeout used in the graceful
shutdown (the context.WithTimeout call that wraps srv.Shutdown) to derive its
duration from the configured AI_REQUEST_TIMEOUT_SECONDS plus a small buffer
(e.g., +5–10s) instead of the fixed 30s. Read the existing AI timeout
configuration (AI_REQUEST_TIMEOUT_SECONDS or the function/const that provides
it), compute shutdownTimeout :=
time.Duration(aiTimeoutSeconds+buffer)*time.Second, and pass that into
context.WithTimeout so srv.Shutdown uses the extended timeout.
🧹 Nitpick comments (1)
gateway/main.go (1)
174-177: Consider adding server timeouts for production hardening.While not required for this PR, consider configuring
ReadTimeout,WriteTimeout, andIdleTimeouton thehttp.Serverto prevent resource exhaustion from slow or malicious clients in production environments.Example server timeout configuration
srv := &http.Server{ Addr: addr, Handler: r, + ReadTimeout: 60 * time.Second, + WriteTimeout: 90 * time.Second, + IdleTimeout: 120 * time.Second, }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
gateway/main.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
| "os/signal" | ||
| "syscall" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation to use tabs instead of spaces.
The import statements appear to use spaces for indentation instead of tabs. Run gofmt or go fmt to ensure consistent formatting with Go standards.
🔎 Fix formatting
- "os/signal"
- "syscall"
+ "os/signal"
+ "syscall"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "os/signal" | |
| "syscall" | |
| "os/signal" | |
| "syscall" |
🤖 Prompt for AI Agents
In @gateway/main.go around lines 20-21, The import block containing "os/signal"
and "syscall" in gateway/main.go uses space indentation instead of tabs; run
gofmt (or go fmt) to reformat the file so the import statements and surrounding
code use Go's standard tab indentation, or manually replace the leading spaces
with tabs in the import block so the "os/signal" and "syscall" lines align with
the rest of the imports.
| go func() { | ||
| log.Printf("[INFO] Gateway listening on %s", addr) | ||
| if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed { | ||
| log.Fatalf("[FATAL] listen error: %v", err) | ||
| } | ||
| }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: log.Fatalf in goroutine prevents graceful shutdown.
Using log.Fatalf (which calls os.Exit()) inside the server goroutine will immediately terminate the program if ListenAndServe returns an error, bypassing all graceful shutdown logic on lines 186-201. For example, if the port is already in use, the program exits without cleanup.
🔎 Proposed fix using error channel
+ errChan := make(chan error, 1)
+
go func() {
log.Printf("[INFO] Gateway listening on %s", addr)
- if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
- log.Fatalf("[FATAL] listen error: %v", err)
- }
+ errChan <- srv.ListenAndServe()
}()
// ---- Graceful shutdown ----
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
- <-quit
- log.Println("[INFO] Shutdown signal received, draining connections...")
+ select {
+ case err := <-errChan:
+ if err != nil && err != http.ErrServerClosed {
+ log.Fatalf("[FATAL] Server failed to start: %v", err)
+ }
+ return
+ case <-quit:
+ log.Println("[INFO] Shutdown signal received, draining connections...")
+ }
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| go func() { | |
| log.Printf("[INFO] Gateway listening on %s", addr) | |
| if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed { | |
| log.Fatalf("[FATAL] listen error: %v", err) | |
| } | |
| }() | |
| errChan := make(chan error, 1) | |
| go func() { | |
| log.Printf("[INFO] Gateway listening on %s", addr) | |
| errChan <- srv.ListenAndServe() | |
| }() | |
| // ---- Graceful shutdown ---- | |
| quit := make(chan os.Signal, 1) | |
| signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) | |
| select { | |
| case err := <-errChan: | |
| if err != nil && err != http.ErrServerClosed { | |
| log.Fatalf("[FATAL] Server failed to start: %v", err) | |
| } | |
| return | |
| case <-quit: | |
| log.Println("[INFO] Shutdown signal received, draining connections...") | |
| } | |
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | |
| defer cancel() | |
| if err := srv.Shutdown(ctx); err != nil { | |
| log.Printf("[ERROR] Server forced to shutdown: %v", err) | |
| } else { | |
| log.Println("[OK] Server shutdown completed") | |
| } |
🤖 Prompt for AI Agents
In @gateway/main.go around lines 179-184, The goroutine currently calls
log.Fatalf on srv.ListenAndServe errors which exits immediately and skips the
graceful shutdown logic around the main shutdown handling; instead remove
log.Fatalf and propagate the error to the main goroutine via an error channel
(e.g., create errCh before launching the goroutine), have the goroutine send any
non-nil, non-http.ErrServerClosed error into errCh after logging, and let the
main select/receive on errCh alongside the existing shutdown signals so the main
goroutine performs the cleanup and graceful shutdown when ListenAndServe fails
on srv (address `addr`) rather than calling os.Exit inside the goroutine.
| // ---- Graceful shutdown ---- | ||
| quit := make(chan os.Signal, 1) | ||
| signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) | ||
|
|
||
| <-quit | ||
| log.Println("[INFO] Shutdown signal received, draining connections...") | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | ||
| defer cancel() | ||
|
|
||
| if err := srv.Shutdown(ctx); err != nil { | ||
| log.Printf("[ERROR] Server forced to shutdown: %v", err) | ||
| } else { | ||
| log.Println("[OK] Server shutdown completed") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check timeout configurations in gateway directory
rg -n "time\.(Second|Minute|Duration)" gateway/ -A 2 -B 2Repository: AnkanMisra/MicroAI-Paygate
Length of output: 11091
🏁 Script executed:
#!/bin/bash
# Also check around line 164 specifically for AI timeout
sed -n '150,180p' gateway/main.goRepository: AnkanMisra/MicroAI-Paygate
Length of output: 958
🏁 Script executed:
#!/bin/bash
# Look for all timeout-related constants or configurations
rg -n "(timeout|TIMEOUT|Timeout)" gateway/ -iRepository: AnkanMisra/MicroAI-Paygate
Length of output: 7862
Consider extending shutdown timeout to avoid race with AI request deadlines.
The graceful shutdown correctly uses srv.Shutdown() with a 30-second timeout to drain connections. However, both the AI endpoint timeout and shutdown timeout are set to 30 seconds, which creates a tight race condition: a request starting just before shutdown receives a fresh 30-second deadline that effectively expires at the same moment as the shutdown grace period, potentially aborting long-running requests before they complete naturally.
The timeouts are configurable (AI_REQUEST_TIMEOUT_SECONDS defaults to 30, shutdown hardcoded to 30). Consider setting the shutdown timeout to at least AI_TIMEOUT + buffer (e.g., 35–40 seconds) to ensure in-flight AI requests can either complete or timeout gracefully before forced shutdown kicks in.
🤖 Prompt for AI Agents
In @gateway/main.go around lines 186-201, The shutdown grace period is hardcoded
to 30s and can race with AI request deadlines; change the context timeout used
in the graceful shutdown (the context.WithTimeout call that wraps srv.Shutdown)
to derive its duration from the configured AI_REQUEST_TIMEOUT_SECONDS plus a
small buffer (e.g., +5–10s) instead of the fixed 30s. Read the existing AI
timeout configuration (AI_REQUEST_TIMEOUT_SECONDS or the function/const that
provides it), compute shutdownTimeout :=
time.Duration(aiTimeoutSeconds+buffer)*time.Second, and pass that into
context.WithTimeout so srv.Shutdown uses the extended timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @gateway/shutdown_test.go:
- Around line 46-47: The test uses time.Sleep(50 * time.Millisecond) to wait for
the background request which creates a race; replace this ad-hoc sleep with
proper synchronization by modifying the slow handler in shutdown_test.go to
signal when it actually begins (e.g., send on a started chan in the handler) and
have the test wait on that channel instead of sleeping, then let the handler
continue (or block on another channel) so the test can deterministically trigger
shutdown and observe behavior; ensure channels are closed or signaled to avoid
goroutine leaks.
- Around line 24-32: The test incorrectly mixes an httptest.Server and a
manually created http.Server: stop using srv as the running server because
httptest.NewUnstartedServer(r) creates and runs its own server (ln) so calling
srv.Shutdown(ctx) is a no-op; either use the httptest server API (remove srv,
start ln with ln.Start() and call ln.Close() / ln.CloseClientConnections() in
the teardown) or replace the httptest helper with a real http.Server started via
srv.ListenAndServe() (or srv.Serve(listener) on a net.Listener started in a
goroutine) and then call srv.Shutdown(ctx) to test graceful shutdown; update
references to srv and ln accordingly (symbols: srv, ln,
httptest.NewUnstartedServer, srv.Shutdown, ln.Close).
- Around line 36-44: The goroutine is calling t.Errorf directly (unsafe) when
http.Get fails; instead capture the error and report it from the main test
goroutine: have the goroutine send the error (or nil) over the done channel or a
new err channel (created alongside done) after calling http.Get and closing
resp.Body, and in the main test routine receive that error and call t.Errorf or
t.Fatal there; update the anonymous goroutine that uses ln.URL+"/slow" and the
test's receive logic to propagate and assert the error in the main goroutine
rather than calling t.Errorf inside the goroutine.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
gateway/request_tracker.gogateway/shutdown_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
gateway/shutdown_test.go (1)
gateway/request_tracker.go (2)
TrackInFlightRequests(16-28)WaitForInFlightRequests(31-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🔇 Additional comments (3)
gateway/request_tracker.go (2)
15-28: Middleware logic is correct.The concurrent tracking implementation is sound:
- WaitGroup and counter are incremented before the defer
- Defer ensures cleanup even on panic
c.Next()executes within the tracked scopeHowever, the package-level state design (lines 10-13) limits testability.
30-38: Helper functions are correctly implemented.Both
WaitForInFlightRequests()andGetActiveRequestCount()properly use their respective synchronization primitives. The atomic load inGetActiveRequestCount()ensures thread-safe reads.However, the package-level state design (lines 10-13) limits testability.
gateway/shutdown_test.go (1)
12-67: Test inherits package-level state issues from request_tracker.go.Since this test uses the package-level
activeRequestsWGandactiveRequestCntfromrequest_tracker.go, it suffers from the same test isolation issues:
- Cannot run in parallel with other tests
- State is not reset between test runs
- May produce incorrect results if run multiple times
This will be resolved once the package-level state issue in
gateway/request_tracker.go(lines 10-13) is addressed by refactoring to a struct-based approach.
| var ( | ||
| activeRequestsWG sync.WaitGroup | ||
| activeRequestCnt int64 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package-level state prevents test isolation and concurrent execution.
The package-level activeRequestsWG and activeRequestCnt variables create shared mutable state that:
- Cannot be reset between test runs, causing test interference
- Prevents parallel test execution (
go test -parallel) - Makes the module non-reusable if multiple instances are needed
Consider refactoring to use a struct-based approach:
type RequestTracker struct {
wg sync.WaitGroup
cnt int64
}
func NewRequestTracker() *RequestTracker {
return &RequestTracker{}
}
func (rt *RequestTracker) Middleware() gin.HandlerFunc {
return func(c *gin.Context) {
rt.wg.Add(1)
atomic.AddInt64(&rt.cnt, 1)
defer func() {
atomic.AddInt64(&rt.cnt, -1)
rt.wg.Done()
}()
c.Next()
}
}
func (rt *RequestTracker) Wait() {
rt.wg.Wait()
}
func (rt *RequestTracker) Count() int64 {
return atomic.LoadInt64(&rt.cnt)
}This allows each test to create its own isolated RequestTracker instance.
| srv := &http.Server{ | ||
| Handler: r, | ||
| } | ||
|
|
||
| // Start test server | ||
| ln := httptest.NewUnstartedServer(r) | ||
| ln.Config = srv | ||
| ln.Start() | ||
| defer ln.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test server setup is incorrect.
The test mixes httptest.NewUnstartedServer with a manually created http.Server, but this doesn't work as intended:
httptest.NewUnstartedServer(r)creates its ownhttp.Serverinternally- Assigning
ln.Config = srvreplaces the httptest server's config, butsrvis still not the actively listening server - Later on line 54,
srv.Shutdown(ctx)attempts to shut downsrv, butsrvwas never started viasrv.ListenAndServe()— the httptest serverlnis what's actually running - This means
srv.Shutdown()likely returns immediately without actually shutting down the test server
🛠️ Proposed fix
Use the httptest server directly without creating a separate http.Server:
- srv := &http.Server{
- Handler: r,
- }
-
// Start test server
- ln := httptest.NewUnstartedServer(r)
- ln.Config = srv
- ln.Start()
- defer ln.Close()
+ ts := httptest.NewServer(r)
+ defer ts.Close()Then update the shutdown logic:
- // Shutdown server
- ctx, cancel := context.WithTimeout(context.Background(), time.Second)
- defer cancel()
-
- start := time.Now()
- if err := srv.Shutdown(ctx); err != nil {
- t.Fatalf("shutdown failed: %v", err)
- }
+ // Close the test server (httptest.Server doesn't support graceful shutdown)
+ start := time.Now()
+ ts.Close()Note: httptest.Server doesn't expose graceful shutdown. For testing graceful shutdown behavior, you need to start a real http.Server with srv.ListenAndServe() on a chosen port or use net.Listen to get a listener, then pass it to srv.Serve(listener).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @gateway/shutdown_test.go around lines 24 - 32, The test incorrectly mixes an
httptest.Server and a manually created http.Server: stop using srv as the
running server because httptest.NewUnstartedServer(r) creates and runs its own
server (ln) so calling srv.Shutdown(ctx) is a no-op; either use the httptest
server API (remove srv, start ln with ln.Start() and call ln.Close() /
ln.CloseClientConnections() in the teardown) or replace the httptest helper with
a real http.Server started via srv.ListenAndServe() (or srv.Serve(listener) on a
net.Listener started in a goroutine) and then call srv.Shutdown(ctx) to test
graceful shutdown; update references to srv and ln accordingly (symbols: srv,
ln, httptest.NewUnstartedServer, srv.Shutdown, ln.Close).
| go func() { | ||
| resp, err := http.Get(ln.URL + "/slow") | ||
| if err != nil { | ||
| t.Errorf("request failed: %v", err) | ||
| return | ||
| } | ||
| resp.Body.Close() | ||
| close(done) | ||
| }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe use of t.Errorf in goroutine.
Line 39 calls t.Errorf from within a goroutine. The testing.T methods are not safe for concurrent use, and calling them from goroutines can cause data races or undefined behavior.
🔒 Proposed fix
Capture the error in the goroutine and check it in the main test flow:
// Make request in background
- done := make(chan struct{})
+ done := make(chan error)
go func() {
resp, err := http.Get(ln.URL + "/slow")
if err != nil {
- t.Errorf("request failed: %v", err)
- return
+ done <- err
+ return
}
resp.Body.Close()
- close(done)
+ done <- nil
}()
// Give request time to start
time.Sleep(50 * time.Millisecond)
// Shutdown server
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
start := time.Now()
if err := srv.Shutdown(ctx); err != nil {
t.Fatalf("shutdown failed: %v", err)
}
WaitForInFlightRequests()
elapsed := time.Since(start)
- <-done
+ if err := <-done; err != nil {
+ t.Fatalf("request failed: %v", err)
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @gateway/shutdown_test.go around lines 36 - 44, The goroutine is calling
t.Errorf directly (unsafe) when http.Get fails; instead capture the error and
report it from the main test goroutine: have the goroutine send the error (or
nil) over the done channel or a new err channel (created alongside done) after
calling http.Get and closing resp.Body, and in the main test routine receive
that error and call t.Errorf or t.Fatal there; update the anonymous goroutine
that uses ln.URL+"/slow" and the test's receive logic to propagate and assert
the error in the main goroutine rather than calling t.Errorf inside the
goroutine.
| // Give request time to start | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sleep-based synchronization creates a race condition.
The 50ms sleep (line 47) assumes the background request will have started by then, but this is not guaranteed. The request might:
- Not have started yet (slow goroutine scheduling)
- Have already completed (if it runs faster than expected)
This makes the test flaky.
⏱️ Proposed fix using proper synchronization
Use a channel to signal when the request has actually started:
// Make request in background
- done := make(chan struct{})
+ requestStarted := make(chan struct{})
+ requestDone := make(chan struct{})
+
go func() {
+ // Signal that we're about to make the request
+ close(requestStarted)
resp, err := http.Get(ln.URL + "/slow")
if err != nil {
t.Errorf("request failed: %v", err)
return
}
resp.Body.Close()
- close(done)
+ close(requestDone)
}()
- // Give request time to start
- time.Sleep(50 * time.Millisecond)
+ // Wait for request to actually start
+ <-requestStarted
+ // Give it a moment to enter the handler
+ time.Sleep(10 * time.Millisecond)Or better yet, modify the slow handler to signal when it starts:
+ handlerStarted := make(chan struct{})
r.GET("/slow", func(c *gin.Context) {
+ close(handlerStarted)
time.Sleep(200 * time.Millisecond)
c.Status(http.StatusOK)
})
// ... later ...
- // Give request time to start
- time.Sleep(50 * time.Millisecond)
+ // Wait for handler to start
+ <-handlerStartedCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @gateway/shutdown_test.go around lines 46 - 47, The test uses time.Sleep(50 *
time.Millisecond) to wait for the background request which creates a race;
replace this ad-hoc sleep with proper synchronization by modifying the slow
handler in shutdown_test.go to signal when it actually begins (e.g., send on a
started chan in the handler) and have the test wait on that channel instead of
sleeping, then let the handler continue (or block on another channel) so the
test can deterministically trigger shutdown and observe behavior; ensure
channels are closed or signaled to avoid goroutine leaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (4)
-
gateway/main.go, line 111 (link)logic:
TrackInFlightRequests()middleware is not registered, so in-flight requests are not tracked -
gateway/main.go, line 193-194 (link)logic: Missing call to
WaitForInFlightRequests()before shutdown completes - in-flight requests won't be waited for -
gateway/main.go, line 147-151 (link)logic: Rate limiter cleanup goroutine leak -
Stop()is never called on the rate limiters during shutdown. Store limiters in a variable accessible to the shutdown code and callStop()on each. -
gateway/main.go, line 172 (link)style: extra indentation before
addrNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
3 files reviewed, 4 comments
…ateway # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gateway/main.go (1)
151-155: Add rate limiter cleanup during shutdown to prevent goroutine leaks.The rate limiters are initialized but never stopped during the shutdown sequence. Each rate limiter likely spawns a cleanup goroutine for expired entries, and without calling
Stop()on shutdown, these goroutines will leak.♻️ Suggested fix to add cleanup before shutdown
Store the limiters in a variable accessible during shutdown and call Stop() on each:
// Initialize rate limiters if enabled + var limiters map[string]RateLimiter if getRateLimitEnabled() { - limiters := initRateLimiters() + limiters = initRateLimiters() r.Use(RateLimitMiddleware(limiters)) log.Println("Rate limiting enabled") }Then in the shutdown section (after Line 211), add cleanup:
log.Println("[INFO] Shutdown signal received, draining connections...") // Stop rate limiter cleanup goroutines if limiters != nil { for tier, limiter := range limiters { limiter.Stop() log.Printf("[INFO] Stopped rate limiter for tier: %s", tier) } }
🤖 Fix all issues with AI agents
In @gateway/go.mod:
- Line 14: Remove the stray indirect module entry for
github.com/ProjectZKM/Ziren/crates/go-runtime/zkvm_runtime from go.mod; delete
the line referencing v0.0.0-20251001021608-1fe7b43fc4d6, then run `go mod tidy`
to update go.sum and ensure no other transitive references remain, and finally
run `go build` or `go test ./...` to verify the module compiles without that
dependency.
In @gateway/main.go:
- Around line 194-197: The http.Server instance named srv is missing timeout
configs; update the srv creation to set ReadHeaderTimeout (e.g., 5s),
ReadTimeout (e.g., 10s), WriteTimeout (e.g., 60s) and IdleTimeout (e.g., 120s)
to protect against slow clients and resource exhaustion—add these fields to the
&http.Server{ Addr: addr, Handler: r, ReadHeaderTimeout: 5*time.Second,
ReadTimeout: 10*time.Second, WriteTimeout: 60*time.Second, IdleTimeout:
120*time.Second } ensuring the time package is imported where the srv variable
is defined.
- Line 192: The assignment to addr (addr := ":" + port) has extra leading
whitespace causing indentation inconsistency; locate the addr := ":" + port
statement (in the HTTP server setup block) and remove the extra leading spaces
so the line aligns with the surrounding statements (match indentation of nearby
variable assignments and function calls).
- Around line 192-221: Register the request-tracking middleware by wrapping your
router with TrackInFlightRequests() (use the middleware returned by
gateway/request_tracker.go) before creating the http.Server so all handlers
increment/decrement in-flight counts; on shutdown, after receiving the quit
signal and before calling srv.Shutdown(ctx), call WaitForInFlightRequests(ctx)
to block until active requests finish (or context timeout) and log the count via
GetActiveRequestCount() to report remaining active requests; ensure you still
call srv.Shutdown(ctx) afterwards and handle its error as before.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
gateway/go.modgateway/main.go
🔇 Additional comments (2)
gateway/main.go (2)
23-24: LGTM!The
os/signalandsyscallimports are correctly added to support graceful shutdown with signal handling.
177-185: LGTM!The receipt cleanup goroutine is properly managed with context cancellation and a final cleanup on shutdown to prevent leaks. This is a good pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gateway/main.go (1)
151-155: Missing rate limiter cleanup causes goroutine leak.Rate limiters are initialized here, and each spawns a cleanup goroutine, but no corresponding cleanup is performed during the shutdown sequence (lines 213-234). This results in goroutine leaks on every shutdown.
🔧 Proposed fix
- Store the limiters map in a package-level variable or make it accessible in the shutdown sequence:
+var rateLimiters map[string]RateLimiter + func main() { // ... // Initialize rate limiters if enabled if getRateLimitEnabled() { - limiters := initRateLimiters() + rateLimiters = initRateLimiters() - r.Use(RateLimitMiddleware(limiters)) + r.Use(RateLimitMiddleware(rateLimiters)) log.Println("Rate limiting enabled") }
- Add cleanup in the shutdown sequence before final shutdown:
<-quit log.Println("[INFO] Shutdown signal received, draining connections...") + // Stop rate limiter cleanup goroutines + if rateLimiters != nil { + for tier, limiter := range rateLimiters { + limiter.Stop() + log.Printf("[INFO] Stopped rate limiter for tier: %s", tier) + } + } + active := GetActiveRequestCount() if active > 0 { log.Printf("[INFO] Waiting for %d in-flight request(s)...", active) WaitForInFlightRequests() log.Println("[INFO] All in-flight requests completed") }Note: This assumes the
RateLimiterinterface includes aStop()method. If not, you'll need to add it to the interface and implement it in the token bucket implementation.
🤖 Fix all issues with AI agents
In @gateway/main.go:
- Around line 213-234: The shutdown sequence currently calls
GetActiveRequestCount() and WaitForInFlightRequests() before srv.Shutdown(),
which allows the server to keep accepting new connections and creates a race;
fix by removing the manual wait and invoking srv.Shutdown(ctx) first (so the
server stops accepting new connections and waits for in-flight requests), and if
you need visibility keep the GetActiveRequestCount() log but do not call
WaitForInFlightRequests() prior to srv.Shutdown(); ensure you still use
context.WithTimeout(...) and handle the error from srv.Shutdown(ctx) as before.
🧹 Nitpick comments (2)
gateway/main.go (2)
193-193: Fix indentation inconsistency.Line 193 appears to lack proper indentation (should have a leading tab to match surrounding code).
♻️ Proposed fix
-addr := ":" + port + addr := ":" + port
195-202: Consider alignment between WriteTimeout and graceful shutdown timeout.The server's WriteTimeout is 60 seconds (line 200), but the graceful shutdown timeout is 30 seconds (line 226). If a request is still within its write deadline (less than 60s) when shutdown begins, the 30-second shutdown timeout may forcefully close the connection before the write completes, undermining graceful shutdown guarantees.
♻️ Recommended adjustment
Set the shutdown timeout to at least match the WriteTimeout:
- ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)Alternatively, document why a shorter shutdown timeout is acceptable for this service's operational requirements.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
gateway/main.go
🧰 Additional context used
🧬 Code graph analysis (1)
gateway/main.go (1)
gateway/request_tracker.go (3)
TrackInFlightRequests(16-28)GetActiveRequestCount(36-38)WaitForInFlightRequests(31-33)
🔇 Additional comments (3)
gateway/main.go (3)
23-24: LGTM!The signal handling imports are correctly added and necessary for the graceful shutdown implementation.
162-162: LGTM!The in-flight request tracking middleware is correctly registered globally, ensuring all requests are tracked during the graceful shutdown sequence.
205-210: LGTM!The server is correctly started in a goroutine with appropriate error handling. The check for
http.ErrServerClosedensures graceful shutdown doesn't trigger false alarms.
| quit := make(chan os.Signal, 1) | ||
| signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) | ||
|
|
||
| <-quit | ||
| log.Println("[INFO] Shutdown signal received, draining connections...") | ||
|
|
||
| active := GetActiveRequestCount() | ||
| if active > 0 { | ||
| log.Printf("[INFO] Waiting for %d in-flight request(s)...", active) | ||
| WaitForInFlightRequests() | ||
| log.Println("[INFO] All in-flight requests completed") | ||
| } | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | ||
| defer cancel() | ||
|
|
||
| if err := srv.Shutdown(ctx); err != nil { | ||
| log.Printf("[ERROR] Server forced to shutdown: %v", err) | ||
| } else { | ||
| log.Println("[OK] Server shutdown completed") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: wait for in-flight requests after stopping new connections.
The current sequence waits for in-flight requests (lines 219-223) before calling srv.Shutdown() (line 229), which means the server continues accepting new connections during the wait. This creates a race:
- Line 219 checks active count (e.g., 1 request)
- A new request arrives and is accepted by the still-running server
- Lines 220-223 wait for the original request to complete
- Line 229 calls
srv.Shutdown(), which must now wait for the new request
The standard graceful shutdown pattern is to call srv.Shutdown() first—it stops accepting new connections and waits for existing ones to complete (up to the timeout). The manual WaitForInFlightRequests() wait before shutdown undermines this.
🔧 Recommended fix
Either remove the manual wait and rely on srv.Shutdown() alone:
<-quit
log.Println("[INFO] Shutdown signal received, draining connections...")
- active := GetActiveRequestCount()
- if active > 0 {
- log.Printf("[INFO] Waiting for %d in-flight request(s)...", active)
- WaitForInFlightRequests()
- log.Println("[INFO] All in-flight requests completed")
- }
-
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}Or, if logging the active count is important, just log it without the manual wait:
<-quit
log.Println("[INFO] Shutdown signal received, draining connections...")
active := GetActiveRequestCount()
if active > 0 {
log.Printf("[INFO] %d in-flight request(s) detected, waiting for completion...", active)
- WaitForInFlightRequests()
- log.Println("[INFO] All in-flight requests completed")
}
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server forced to shutdown: %v", err)
} else {
log.Println("[OK] Server shutdown completed")
}srv.Shutdown() internally waits for connections to idle, making the explicit WaitForInFlightRequests() redundant and potentially harmful.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| quit := make(chan os.Signal, 1) | |
| signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) | |
| <-quit | |
| log.Println("[INFO] Shutdown signal received, draining connections...") | |
| active := GetActiveRequestCount() | |
| if active > 0 { | |
| log.Printf("[INFO] Waiting for %d in-flight request(s)...", active) | |
| WaitForInFlightRequests() | |
| log.Println("[INFO] All in-flight requests completed") | |
| } | |
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | |
| defer cancel() | |
| if err := srv.Shutdown(ctx); err != nil { | |
| log.Printf("[ERROR] Server forced to shutdown: %v", err) | |
| } else { | |
| log.Println("[OK] Server shutdown completed") | |
| } | |
| quit := make(chan os.Signal, 1) | |
| signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) | |
| <-quit | |
| log.Println("[INFO] Shutdown signal received, draining connections...") | |
| ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) | |
| defer cancel() | |
| if err := srv.Shutdown(ctx); err != nil { | |
| log.Printf("[ERROR] Server forced to shutdown: %v", err) | |
| } else { | |
| log.Println("[OK] Server shutdown completed") | |
| } |
🤖 Prompt for AI Agents
In @gateway/main.go around lines 213 - 234, The shutdown sequence currently
calls GetActiveRequestCount() and WaitForInFlightRequests() before
srv.Shutdown(), which allows the server to keep accepting new connections and
creates a race; fix by removing the manual wait and invoking srv.Shutdown(ctx)
first (so the server stops accepting new connections and waits for in-flight
requests), and if you need visibility keep the GetActiveRequestCount() log but
do not call WaitForInFlightRequests() prior to srv.Shutdown(); ensure you still
use context.WithTimeout(...) and handle the error from srv.Shutdown(ctx) as
before.
|
@greptile review the Pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 files reviewed, 8 comments
| ) | ||
|
|
||
| require ( | ||
| github.com/ProjectZKM/Ziren/crates/go-runtime/zkvm_runtime v0.0.0-20251001021608-1fe7b43fc4d6 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency is unrelated to graceful shutdown functionality and is not used anywhere in the codebase. This should be removed from this PR as it:
- Adds unnecessary bloat to the project
- Could introduce security vulnerabilities or supply chain risks
- Makes code review more difficult by mixing unrelated changes
Verified with: grep -r "zkvm_runtime" --include="*.go" . (no matches found)
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/go.mod
Line: 14:14
Comment:
This dependency is unrelated to graceful shutdown functionality and is not used anywhere in the codebase. This should be removed from this PR as it:
1. Adds unnecessary bloat to the project
2. Could introduce security vulnerabilities or supply chain risks
3. Makes code review more difficult by mixing unrelated changes
Verified with: `grep -r "zkvm_runtime" --include="*.go" .` (no matches found)
How can I resolve this? If you propose a fix, please make it concise.| "os/signal" | ||
| "syscall" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import formatting is incorrect - these imports have leading spaces and should align with the other imports above.
| "os/signal" | |
| "syscall" | |
| "os/signal" | |
| "syscall" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 23:24
Comment:
Import formatting is incorrect - these imports have leading spaces and should align with the other imports above.
```suggestion
"os/signal"
"syscall"
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| active := GetActiveRequestCount() | ||
| if active > 0 { | ||
| log.Printf("[INFO] Waiting for %d in-flight request(s)...", active) | ||
| WaitForInFlightRequests() | ||
| log.Println("[INFO] All in-flight requests completed") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Redundant and race-prone shutdown logic
This manual request tracking and waiting is both redundant and architecturally incorrect:
-
http.Server.Shutdown()already waits for active connections - From Go docs: "Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down." So the call on line 229 will wait anyway. -
Race condition: New requests can arrive between line 222 (when WaitForInFlightRequests returns) and line 229 (when srv.Shutdown is called). The server is still accepting connections during this window.
-
Misleading logs: Line 223 logs "All in-flight requests completed" but the server is still running and accepting new requests until line 229.
-
No timeout:
WaitForInFlightRequests()blocks forever if a request hangs. The 30-second timeout on line 226 only applies tosrv.Shutdown(), not this manual wait.
Correct approach: Remove lines 219-224 entirely and rely solely on srv.Shutdown() to wait for active connections, OR redesign to stop accepting new connections before waiting (which is what Shutdown already does).
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/main.go
Line: 219:224
Comment:
**CRITICAL: Redundant and race-prone shutdown logic**
This manual request tracking and waiting is both redundant and architecturally incorrect:
1. **`http.Server.Shutdown()` already waits for active connections** - From Go docs: "Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down." So the call on line 229 will wait anyway.
2. **Race condition**: New requests can arrive between line 222 (when WaitForInFlightRequests returns) and line 229 (when srv.Shutdown is called). The server is still accepting connections during this window.
3. **Misleading logs**: Line 223 logs "All in-flight requests completed" but the server is still running and accepting new requests until line 229.
4. **No timeout**: `WaitForInFlightRequests()` blocks forever if a request hangs. The 30-second timeout on line 226 only applies to `srv.Shutdown()`, not this manual wait.
**Correct approach**: Remove lines 219-224 entirely and rely solely on `srv.Shutdown()` to wait for active connections, OR redesign to stop accepting new connections before waiting (which is what Shutdown already does).
How can I resolve this? If you propose a fix, please make it concise.| var ( | ||
| activeRequestsWG sync.WaitGroup | ||
| activeRequestCnt int64 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global state with no reset mechanism creates issues in tests and server restarts
These global variables are never reset, which causes problems:
- Test pollution: If multiple tests create servers (like in shutdown_test.go), the counters persist between tests, causing incorrect counts
- Server restart issues: If the server is stopped and restarted within the same process, the counts will be wrong
- The test in shutdown_test.go only works by accident because it's the only test, but running multiple shutdown tests would fail
Better approach: Either make these per-server instance variables, or provide a reset function for tests. Example:
// For tests
func ResetRequestTracking() {
activeRequestsWG = sync.WaitGroup{}
atomic.StoreInt64(&activeRequestCnt, 0)
}Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/request_tracker.go
Line: 10:13
Comment:
**Global state with no reset mechanism creates issues in tests and server restarts**
These global variables are never reset, which causes problems:
1. **Test pollution**: If multiple tests create servers (like in shutdown_test.go), the counters persist between tests, causing incorrect counts
2. **Server restart issues**: If the server is stopped and restarted within the same process, the counts will be wrong
3. **The test in shutdown_test.go only works by accident** because it's the only test, but running multiple shutdown tests would fail
**Better approach**: Either make these per-server instance variables, or provide a reset function for tests. Example:
```go
// For tests
func ResetRequestTracking() {
activeRequestsWG = sync.WaitGroup{}
atomic.StoreInt64(&activeRequestCnt, 0)
}
```
How can I resolve this? If you propose a fix, please make it concise.| start := time.Now() | ||
| if err := srv.Shutdown(ctx); err != nil { | ||
| t.Fatalf("shutdown failed: %v", err) | ||
| } | ||
|
|
||
| WaitForInFlightRequests() | ||
| elapsed := time.Since(start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test doesn't actually verify what it claims to test
This test has a fundamental flaw: srv.Shutdown(ctx) on line 54 already waits for active connections to complete (this is documented behavior of http.Server.Shutdown). So by the time line 58 calls WaitForInFlightRequests(), the request has already finished.
The test is measuring that http.Server.Shutdown() works (which is a given), not that our middleware tracking works correctly.
What should be tested instead:
- Verify that
GetActiveRequestCount()returns the correct count WHILE requests are in-flight - Test that requests complete successfully (check response status code)
- Test the interaction between the tracking middleware and actual shutdown
Current test: Measures Shutdown() waiting for requests ✓ (not our code)
Should test: Our tracking middleware correctly counts requests ✗ (not tested)
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/shutdown_test.go
Line: 53:59
Comment:
**Test doesn't actually verify what it claims to test**
This test has a fundamental flaw: `srv.Shutdown(ctx)` on line 54 already waits for active connections to complete (this is documented behavior of http.Server.Shutdown). So by the time line 58 calls `WaitForInFlightRequests()`, the request has already finished.
The test is measuring that `http.Server.Shutdown()` works (which is a given), not that our middleware tracking works correctly.
**What should be tested instead:**
1. Verify that `GetActiveRequestCount()` returns the correct count WHILE requests are in-flight
2. Test that requests complete successfully (check response status code)
3. Test the interaction between the tracking middleware and actual shutdown
Current test: Measures Shutdown() waiting for requests ✓ (not our code)
Should test: Our tracking middleware correctly counts requests ✗ (not tested)
How can I resolve this? If you propose a fix, please make it concise.| // Give request time to start | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: Sleep-based synchronization is unreliable
Using time.Sleep(50 * time.Millisecond) to "give request time to start" is a classic test race condition:
- On slow systems (CI, loaded machines), 50ms may not be enough and the test will fail
- On fast systems, the request might already be complete by then
- This makes tests flaky and unpredictable
Better approach: Use proper synchronization with channels or wait for the request to actually be in-flight. Example:
started := make(chan struct{})
go func() {
// Signal when request enters the handler
close(started)
resp, err := http.Get(ln.URL + "/slow")
// ...
}()
<-started // Wait for request to actually startOr check GetActiveRequestCount() > 0 in a loop with timeout instead of sleeping.
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/shutdown_test.go
Line: 46:47
Comment:
**Race condition: Sleep-based synchronization is unreliable**
Using `time.Sleep(50 * time.Millisecond)` to "give request time to start" is a classic test race condition:
- On slow systems (CI, loaded machines), 50ms may not be enough and the test will fail
- On fast systems, the request might already be complete by then
- This makes tests flaky and unpredictable
**Better approach**: Use proper synchronization with channels or wait for the request to actually be in-flight. Example:
```go
started := make(chan struct{})
go func() {
// Signal when request enters the handler
close(started)
resp, err := http.Get(ln.URL + "/slow")
// ...
}()
<-started // Wait for request to actually start
```
Or check `GetActiveRequestCount() > 0` in a loop with timeout instead of sleeping.
How can I resolve this? If you propose a fix, please make it concise.| go func() { | ||
| resp, err := http.Get(ln.URL + "/slow") | ||
| if err != nil { | ||
| t.Errorf("request failed: %v", err) | ||
| return | ||
| } | ||
| resp.Body.Close() | ||
| close(done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should verify that the request completed successfully by checking the response status code, not just that it didn't error. A connection could be closed mid-request and still not return an error, but the response would be incomplete.
Add verification:
| go func() { | |
| resp, err := http.Get(ln.URL + "/slow") | |
| if err != nil { | |
| t.Errorf("request failed: %v", err) | |
| return | |
| } | |
| resp.Body.Close() | |
| close(done) | |
| go func() { | |
| resp, err := http.Get(ln.URL + "/slow") | |
| if err != nil { | |
| t.Errorf("request failed: %v", err) | |
| return | |
| } | |
| defer resp.Body.Close() | |
| if resp.StatusCode != http.StatusOK { | |
| t.Errorf("expected status 200, got %d", resp.StatusCode) | |
| } | |
| close(done) | |
| }() |
Prompt To Fix With AI
This is a comment left during a code review.
Path: gateway/shutdown_test.go
Line: 36:43
Comment:
The test should verify that the request completed successfully by checking the response status code, not just that it didn't error. A connection could be closed mid-request and still not return an error, but the response would be incomplete.
Add verification:
```suggestion
go func() {
resp, err := http.Get(ln.URL + "/slow")
if err != nil {
t.Errorf("request failed: %v", err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("expected status 200, got %d", resp.StatusCode)
}
close(done)
}()
```
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
The PR description claims "Fixed rate limiter cleanup goroutine leak during shutdown", but this is incorrect. Each call to This means:
Fix required: Store the limiters in a variable accessible during shutdown and call Prompt To Fix With AIThis is a comment left during a code review.
Path: gateway/main.go
Line: 151:155
Comment:
**CRITICAL: Rate limiter cleanup goroutine leak is NOT fixed**
The PR description claims "Fixed rate limiter cleanup goroutine leak during shutdown", but this is incorrect. Each call to `NewTokenBucket()` (in `initRateLimiters()`) spawns a cleanup goroutine via `go tb.cleanup()` (ratelimit.go:56), but there's no code anywhere in main.go that calls the `Stop()` method on these limiters during shutdown.
This means:
1. The cleanup goroutines continue running after shutdown begins
2. They will never be cleaned up properly
3. In tests or repeated server starts, goroutines accumulate
**Fix required:** Store the limiters in a variable accessible during shutdown and call `Stop()` on each limiter before the server shuts down, similar to how the receipt cleanup is handled with `cleanupCancel()`.
How can I resolve this? If you propose a fix, please make it concise. |
AnkanMisra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes Requested
Thanks for implementing graceful shutdown! The core functionality is there, but there are a few issues that need to be fixed before merging.
1. BUG (High): Shutdown sequence is inverted
File: gateway/main.go lines 214-225
// Current code (WRONG order):
<-quit
log.Println("[INFO] Shutdown signal received, draining connections...")
active := GetActiveRequestCount()
if active > 0 {
WaitForInFlightRequests() // ❌ Waits here FIRST
}
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
srv.Shutdown(ctx) // ❌ But server still accepts new connections!Problem: You're waiting for in-flight requests BEFORE stopping the server from accepting new connections. This means:
- New requests can keep arriving while you wait
- Under load, shutdown may never complete
Fix: Call srv.Shutdown() FIRST, then wait for in-flight requests:
<-quit
log.Println("[INFO] Shutdown signal received...")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Stop accepting new connections FIRST
if err := srv.Shutdown(ctx); err != nil {
log.Printf("[ERROR] Server shutdown: %v", err)
}
// Then wait for existing requests
WaitForInFlightRequests()
log.Println("[OK] Server shutdown completed")2. BUG (Medium): Rate limiter goroutine leak NOT fixed
File: gateway/main.go lines 151-155
The PR description claims "Fixed rate limiter cleanup goroutine leak during shutdown" but TokenBucket.Stop() is never called.
Each NewTokenBucket() spawns a cleanup goroutine (go tb.cleanup() in ratelimit.go:56). These are never stopped during shutdown.
Fix: Either:
- Store limiters and call
Stop()during shutdown, OR - Remove the claim from the PR description
3. Style (Low): Formatting issues
File: gateway/main.go line 193
addr := ":" + port // ❌ Missing leading tabFile: gateway/main.go lines 23-24
"os/signal" // ❌ Spaces instead of tabs
"syscall"Fix: Run go fmt ./... in the gateway directory.
Summary
| Issue | Severity | Action |
|---|---|---|
| Shutdown sequence inverted | High | Reorder: srv.Shutdown() before WaitForInFlightRequests() |
| Rate limiter leak not fixed | Medium | Fix or remove claim from PR description |
| Formatting | Low | Run go fmt ./... |
Please address these issues and push a new commit. The core implementation is good - just needs these fixes!
|
@Arpita01-work any updates |
Summary
Closes #19
This PR implements full graceful shutdown support for the Gateway service.
The server now handles SIGINT/SIGTERM signals and shuts down cleanly using
http.Server.Shutdown()with a context timeout, ensuring that in-flightrequests are allowed to complete instead of being terminated abruptly.
Changes Included
gin.Run()with an explicithttp.Serverhttp.Server.Shutdown()with timeoutsync.WaitGroupTesting