-
Notifications
You must be signed in to change notification settings - Fork 1
Ensure that bad HTTP status codes result in the producer retrying #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that bad HTTP status codes result in the producer retrying #21
Conversation
SemMulder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Two small comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might have missed this one? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added an .error_for_status() call to the version endpoint call, but intentionally leave out any retry logic there, as that endpoint is intended to (only) be used by hand. (There is also no parsing of the response result going on).
opsqueue/src/producer/client.rs
Outdated
| // Maybe a different HTTP client library is nicer in this regard? | ||
| Self::HTTPClientError(inner) => { | ||
| inner.is_connect() || inner.is_timeout() || inner.is_decode() | ||
| inner.is_status() || inner.is_connect() || inner.is_timeout() || inner.is_decode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This marks everything from 400 to 599 inclusive as ephemeral.
Do we really want to include the whole 400 range? E.g. what if we post a bad body because of some version mismatch or something? I'm too unfamiliar with the code to judge whether it matters, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, only catching 5xx is the correct approach.
4xx errors can (only?) come up if e.g. a Traefik or other prroxy in the middle is misconfigured, in which case we'd like to see that early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One edge-case is 429 Too Many Requests though, in that case you likely want to retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. Let's see if we end up triggering that case in production.
|
@OpsBotPrime merge and deploy to production |
|
Unknown or invalid command found: |
|
@OpsBotPrime merge and tag |
|
Rebased as 74328de, waiting for CI … |
…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false
|
CI job 🟡 started. |
|
The build failed ❌. If this is the result of a flaky test, then tag me again with the |
|
@OpsBotPrime retry |
Before, this would result in a non-ephemeral `InternalProducerClientError` as the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.
…etrying Approved-by: Qqwy Priority: Normal Auto-deploy: false
|
Rebased as b4fb822, waiting for CI … |
|
CI job 🟡 started. |
3b8bae3 to
b4fb822
Compare
Before, this would result in a non-ephemeral
InternalProducerClientErroras the client would still attempt to decode JSON from the response body even on a non-200 HTTP status.