Skip to content

Conversation

blink1073
Copy link
Member

@blink1073 blink1073 commented Aug 26, 2025

Currently testing with this script for async:

import asyncio
from pymongo import AsyncMongoClient

client = AsyncMongoClient(maxConnecting=100)

async def target():
    await client.admin.command("ping")

async def main():
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentRateLimiterEnabled=True)
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentRatePerSec=30)
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentBurstCapacitySecs=1)
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentMaxQueueDepth=1)

    # Warm the pool so there are existing connections.
    tasks = []
    for i in range(10):
        tasks.append(asyncio.create_task(target()))
    await asyncio.wait(tasks)

    tasks = []
    for i in range(200):
        tasks.append(asyncio.create_task(target()))

    await asyncio.wait(tasks)


asyncio.run(main())

and this one for sync:

from pymongo import MongoClient
from concurrent.futures import ThreadPoolExecutor, wait

client = MongoClient(maxConnecting=100)

def target():
    client.admin.command("ping")

def main():
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentRateLimiterEnabled=True)
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentRatePerSec=30)
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentBurstCapacitySecs=1)
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentMaxQueueDepth=1)

    # Warm the pool so there are existing connections.
    print("1")
    tasks = []
    pool = ThreadPoolExecutor(200)
    for i in range(10):
        tasks.append(pool.submit(target))
    wait(tasks)
    print("2")
    tasks = []
    for i in range(200):
        tasks.append(pool.submit(target))
    wait(tasks)

main()

blink1073 and others added 15 commits August 19, 2025 11:23
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…b#2507)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Steven Silvester <steve.silvester@mongodb.com>
conn.conn.get_conn.read(1)
except Exception as _:
# TODO: verify the exception
close_conn = False
Copy link
Member

@ShaneHarvey ShaneHarvey Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments:

  1. I believe this logic needs to move to connection checkout. Here in connection check in we already know the connection is useable because we're checking it back in after a successful command.
  2. Instead of a 1ms read can we reuse the existing _perished() + conn_closed() methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@blink1073
Copy link
Member Author

Encryption failure is unrelated: https://jira.mongodb.org/browse/PYTHON-5521

@ShaneHarvey ShaneHarvey changed the title DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers PYTHON-5536 Avoid clearing the connection pool when the server connection rate limiter triggers Sep 9, 2025
@ShaneHarvey ShaneHarvey marked this pull request as ready for review September 9, 2025 20:08
@ShaneHarvey ShaneHarvey requested a review from a team as a code owner September 9, 2025 20:08
if not self.is_sdam and type(e) == AutoReconnect:
self._backoff += 1
e._add_error_label("SystemOverloaded")
e._add_error_label("Retryable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move this logic so that it covers the TCP+TLS handshake which happen up above.

Copy link
Member Author

@blink1073 blink1073 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set a breakpoint in the TCP+TLS handshake error handler and confirmed that handshakes are succeeding. The error only occurs on hello/auth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'm actually surprised by this since the design SPM-4319 indicates the rate limiter rejection happens before the TLS handshake.

@blink1073
Copy link
Member Author

Ideally we'd like to detect connection reset by peer errors and only backoff for those, but in my testing, during backoff, we can sometimes get a regular EOF instead of a closing error, which triggers the raise OSError("connection closed") error.

@@ -338,8 +338,11 @@ async def read(self, request_id: Optional[int], max_message_size: int) -> tuple[
if self._done_messages:
message = await self._done_messages.popleft()
else:
if self._closing_exception:
raise self._closing_exception
if self._closed.done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is calling is_closing here better? It'll catch more edge cases in theory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm let me try that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is ambiguous as to whether connection_lost as been called yet. Since connection_lost is synchronous, checking for self._closed.done() assures that we have actually lost the connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants