Ensure graceful shutdown in GenericContainer.stop() (#10365) #11350
+46
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #10365
Motivation
This PR addresses Issue #10365 where the trap signal in a container script is ignored when GenericContainer#stop() is called.
The root cause is that GenericContainer#stop() currently delegates the stop process directly to ResourceReaper#stopAndRemoveContainer(), which issues a killContainerCmd (SIGKILL). This prevents the container from performing any graceful shutdown tasks (e.g., closing connections, flushing logs, executing hooks).
This PR modifies GenericContainer#stop() to attempt a dockerClient.stopContainerCmd() (SIGTERM) before delegating removal to the ResourceReaper.
Changes
Modified GenericContainer#stop() to issue stopContainerCmd before calling ResourceReaper.
Added testPostHookExecutionOnStop in GenericContainerTest to verify that trap hooks are correctly executed upon stopping.
Discussion / Concerns (Reason for Draft)
I am submitting this as a Draft because changing the default behavior to stopContainerCmd implies that stop() will now wait for the container to exit.
Pros: Correctness. Containers can handle lifecycle hooks properly.
Cons: Potential performance regression in tests where containers ignore SIGTERM, causing a delay of up to 10s (default timeout) per test.
I have verified locally that well-behaved containers (which handle SIGTERM) stop almost instantly, so the performance impact should be minimal for correct implementations. However, I would like to hear the maintainers' opinion on whether this should be the default behavior or if we should introduce an opt-in configuration (e.g., .withShutdownTimeout()).