Ensure that bad HTTP status codes result in the producer retrying #21

Qqwy · 2025-10-31T11:18:30Z

Before, this would result in a non-ephemeral InternalProducerClientError as the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.

SemMulder

Awesome! Two small comments.

SemMulder · 2025-10-31T13:18:52Z

opsqueue/src/producer/client.rs

I think you might have missed this one? :)

I have now added an .error_for_status() call to the version endpoint call, but intentionally leave out any retry logic there, as that endpoint is intended to (only) be used by hand. (There is also no parsing of the response result going on).

SemMulder · 2025-10-31T13:22:31Z

opsqueue/src/producer/client.rs

            // Maybe a different HTTP client library is nicer in this regard?
            Self::HTTPClientError(inner) => {
-                inner.is_connect() || inner.is_timeout() || inner.is_decode()
+                inner.is_status() || inner.is_connect() || inner.is_timeout() || inner.is_decode()


This marks everything from 400 to 599 inclusive as ephemeral.

Do we really want to include the whole 400 range? E.g. what if we post a bad body because of some version mismatch or something? I'm too unfamiliar with the code to judge whether it matters, what do you think?

You're right, only catching 5xx is the correct approach.

4xx errors can (only?) come up if e.g. a Traefik or other prroxy in the middle is misconfigured, in which case we'd like to see that early.

One edge-case is 429 Too Many Requests though, in that case you likely want to retry?

Possibly. Let's see if we end up triggering that case in production.

Qqwy · 2025-11-03T10:25:35Z

@OpsBotPrime merge and deploy to production

OpsBotPrime · 2025-11-03T10:25:37Z

Unknown or invalid command found:

comment:1:30:
  |
1 | @OpsBotPrime merge and deploy to production
  |                              ^
No deployment environments have been configured.

Basic usage is explained here.

Qqwy · 2025-11-03T10:26:43Z

@OpsBotPrime merge and tag

OpsBotPrime · 2025-11-03T10:26:51Z

Rebased as 74328de, waiting for CI …

…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false

OpsBotPrime · 2025-11-03T10:26:54Z

CI job 🟡 started.

OpsBotPrime · 2025-11-03T10:31:50Z

The build failed ❌.

If this is the result of a flaky test, then tag me again with the retry command. Otherwise, push a new commit and tag me again.

Qqwy · 2025-11-03T10:32:59Z

@OpsBotPrime retry

Before, this would result in a non-ephemeral `InternalProducerClientError` as the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.

…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false

OpsBotPrime · 2025-11-03T10:33:09Z

Rebased as b4fb822, waiting for CI …

OpsBotPrime · 2025-11-03T10:33:13Z

CI job 🟡 started.

OpsBotPrime · 2025-11-03T10:37:36Z

@Qqwy I tagged your PR with v37. Please wait for the build of b4fb822 to pass and don't forget to deploy it!

Qqwy requested a review from SemMulder October 31, 2025 13:16

SemMulder reviewed Oct 31, 2025

View reviewed changes

OpsBotPrime added a commit that referenced this pull request Nov 3, 2025

Merge #21: Ensure that bad HTTP status codes result in the producer r…

74328de

…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false

Qqwy and others added 3 commits November 3, 2025 11:33

Ensure that bad HTTP status codes result in the producer retrying

0257872

Before, this would result in a non-ephemeral `InternalProducerClientError` as the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.

Bump version

4f5afcd

Merge #21: Ensure that bad HTTP status codes result in the producer r…

b4fb822

…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false

OpsBotPrime force-pushed the producer_client-retry_on_http_status_failures branch from 3b8bae3 to b4fb822 Compare November 3, 2025 10:37

OpsBotPrime merged commit b4fb822 into master Nov 3, 2025
6 of 7 checks passed

Ensure that bad HTTP status codes result in the producer retrying #21

Ensure that bad HTTP status codes result in the producer retrying #21

Uh oh!

Conversation

Qqwy commented Oct 31, 2025

Uh oh!

SemMulder left a comment

Choose a reason for hiding this comment

Uh oh!

SemMulder Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Qqwy Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

SemMulder Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Qqwy Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

SemMulder Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Qqwy Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Qqwy commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

Qqwy commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

Qqwy commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

OpsBotPrime commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants