-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-5536 Avoid clearing the connection pool when the server connection rate limiter triggers #2509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: backpressure
Are you sure you want to change the base?
Conversation
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…b#2507) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ction rate limiter triggers
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Steven Silvester <steve.silvester@mongodb.com>
pymongo/asynchronous/pool.py
Outdated
conn.conn.get_conn.read(1) | ||
except Exception as _: | ||
# TODO: verify the exception | ||
close_conn = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 comments:
- I believe this logic needs to move to connection checkout. Here in connection check in we already know the connection is useable because we're checking it back in after a successful command.
- Instead of a 1ms read can we reuse the existing _perished() + conn_closed() methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
(cherry picked from commit 0d4c84e)
Encryption failure is unrelated: https://jira.mongodb.org/browse/PYTHON-5521 |
…MiB error codes (mongodb#2515) (cherry picked from commit c0e0554)
This reverts commit 532c1b8.
pymongo/asynchronous/pool.py
Outdated
if not self.is_sdam and type(e) == AutoReconnect: | ||
self._backoff += 1 | ||
e._add_error_label("SystemOverloaded") | ||
e._add_error_label("Retryable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to move this logic so that it covers the TCP+TLS handshake which happen up above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set a breakpoint in the TCP+TLS handshake error handler and confirmed that handshakes are succeeding. The error only occurs on hello/auth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I'm actually surprised by this since the design SPM-4319 indicates the rate limiter rejection happens before the TLS handshake.
Ideally we'd like to detect |
@@ -338,8 +338,11 @@ async def read(self, request_id: Optional[int], max_message_size: int) -> tuple[ | |||
if self._done_messages: | |||
message = await self._done_messages.popleft() | |||
else: | |||
if self._closing_exception: | |||
raise self._closing_exception | |||
if self._closed.done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is calling is_closing
here better? It'll catch more edge cases in theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm let me try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is ambiguous as to whether connection_lost
as been called yet. Since connection_lost
is synchronous, checking for self._closed.done()
assures that we have actually lost the connection.
Currently testing with this script for async:
and this one for sync: