PYTHON-2390 - Retryable reads use the same implicit session #2544

NoahStapp · 2025-09-17T18:26:35Z

No description provided.

ShaneHarvey · 2025-09-17T19:18:05Z

test/test_retryable_reads.py

+            retryReads=True,
+        )
+
+        set_fail_point(client, fail_command)


Let's use self.fail_point() here.

ShaneHarvey · 2025-09-17T19:20:32Z

test/test_retryable_reads.py

+
+        set_fail_point(client, fail_command)
+
+        client.t.t.estimated_document_count()


Can we extend this test to cover a few other operations as well?

ShaneHarvey · 2025-09-17T19:22:46Z

test/test_retryable_reads.py

+            if event.command_name == "count"
+        ]
+        self.assertEqual(len(lsids), 2)
+        self.assertEqual(lsids[0], lsids[1])


Should we fix PYTHON-2391 first? Otherwise this test doesn't prove the fix works correctly since first.command is the same dict instance as second.command.

Yes, we should merge the PR for PYTHON-2391 (#2545) first to verify this works correctly.

…mandCursors

NoahStapp · 2025-09-22T17:09:08Z

Some additional context:

Our current code uses explicit_session in two different ways that have the same result (don't close this session after the cursor that uses it is done):

Actual explicit sessions passed by the user. This is the "intended" behavior and is consistent with the parameter name. However, the same functionality can be achieved with better separation of concerns by using the existing ClientSession.implicit property.
Implicit sessions that are used across multiple sub-operations within a single user-level operation. For example, create_collection first calls list_collection_names if supported, using the same implicit session for both operations. list_collection_names uses a CommandCursor, which by default closes any implicit session passed to it when done. To avoid this behavior, we pretended that the implicit session was explicit. This pattern exists in multiple places and has been replaced by the new ClientSession.leave_alive property for the same purpose.

ShaneHarvey · 2025-09-22T17:58:50Z

pymongo/synchronous/client_session.py

+
+    @leave_alive.setter
+    def leave_alive(self, value: bool) -> None:
+        self._leave_alive = value


None of these new apis should be public since implicit sessions are only used internally by pymongo.

ShaneHarvey · 2025-09-22T18:00:34Z

pymongo/synchronous/change_stream.py

+        with self._client._tmp_session(self._session) as s:
+            if s:
+                s.leave_alive = True
+            return self._run_aggregation_cmd(session=s)


Wouldn't leaving the cursor alive here leak a session every time the change stream cursor is closed?

Good catch. This shouldn't have leave_alive set.

ShaneHarvey · 2025-09-22T18:02:46Z

pymongo/synchronous/client_session.py

+        return self._implicit
+
+    @property
+    def attached_to_cursor(self) -> bool:


Do we need two attributes to track ownership?

Can you be more specific? As in we need a second attribute to track a different axis of ownership?

Eh I was thinking tmp_session would work via the existing implicit=True/False attribute + a new "owner" attribute but your attached_to_cursor+leave_alive implementation seems simpler.

ShaneHarvey · 2025-09-22T18:40:36Z

Looks like there's one test failure to fix:

__________________________ TestSession.test_database ___________________________

self = <test.asynchronous.test_session.TestSession testMethod=test_database>

    async def asyncTearDown(self):
        monitoring._SENSITIVE_COMMANDS.update(self.sensitive_commands)
        await self.client.drop_database("pymongo_test")
        used_lsids = self.initial_lsids.copy()
        for event in self.session_checker_listener.started_events:
            if "lsid" in event.command:
                used_lsids.add(event.command["lsid"]["id"])
    
        current_lsids = {s["id"] for s in session_ids(self.client)}
>       self.assertLessEqual(used_lsids, current_lsids)
E       AssertionError: {Binary(b'\x92\xf1~\xfa\\\xbdI\xed\x96m\xba\x12\xdd\xba\x92\xbf', 4), Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)} not less than or equal to {Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)}

test/asynchronous/test_session.py:116: AssertionError

ShaneHarvey · 2025-09-22T18:42:56Z

pymongo/asynchronous/client_session.py

+    @property
+    def _is_implicit(self) -> bool:
+        """Whether this session was implicitly created by the driver."""
+        return self._implicit


This is personal preference but do we really need these @property helpers? Usually we just access the private attribute directly, eg:

if session._implicit:... if session._attached_to_cursor:...

This way there's less indirection and boilerplate code.

For internal attributes it makes more sense to not have @property, agreed.

NoahStapp · 2025-09-22T20:20:57Z

Looks like there's one test failure to fix:

__________________________ TestSession.test_database ___________________________

self = <test.asynchronous.test_session.TestSession testMethod=test_database>

    async def asyncTearDown(self):
        monitoring._SENSITIVE_COMMANDS.update(self.sensitive_commands)
        await self.client.drop_database("pymongo_test")
        used_lsids = self.initial_lsids.copy()
        for event in self.session_checker_listener.started_events:
            if "lsid" in event.command:
                used_lsids.add(event.command["lsid"]["id"])
    
        current_lsids = {s["id"] for s in session_ids(self.client)}
>       self.assertLessEqual(used_lsids, current_lsids)
E       AssertionError: {Binary(b'\x92\xf1~\xfa\\\xbdI\xed\x96m\xba\x12\xdd\xba\x92\xbf', 4), Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)} not less than or equal to {Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)}

test/asynchronous/test_session.py:116: AssertionError

But only on PyPy 🫠

NoahStapp · 2025-09-23T17:47:08Z

Looks like there's one test failure to fix:

__________________________ TestSession.test_database ___________________________

self = <test.asynchronous.test_session.TestSession testMethod=test_database>

    async def asyncTearDown(self):
        monitoring._SENSITIVE_COMMANDS.update(self.sensitive_commands)
        await self.client.drop_database("pymongo_test")
        used_lsids = self.initial_lsids.copy()
        for event in self.session_checker_listener.started_events:
            if "lsid" in event.command:
                used_lsids.add(event.command["lsid"]["id"])
    
        current_lsids = {s["id"] for s in session_ids(self.client)}
>       self.assertLessEqual(used_lsids, current_lsids)
E       AssertionError: {Binary(b'\x92\xf1~\xfa\\\xbdI\xed\x96m\xba\x12\xdd\xba\x92\xbf', 4), Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)} not less than or equal to {Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)}

test/asynchronous/test_session.py:116: AssertionError

The failure was caused by PyPy garbage collecting differently than CPython: CPython GC'd the CommandCursor created by _list_collection_names earlier than PyPy does. Since the session._attached_to_cursor flag is only unset when a cursor is closed, this caused the PyPy test to fail due to the session still being attached at test teardown.

I've fixed this by explicitly closing the cursor rather than relying on garbage collection. Should we make doing so a standard pattern in the codebase?

ShaneHarvey · 2025-09-23T17:53:07Z

Good find. The bug sounds a little off. We should always be ending the session once the cursor is fully iterated. And _list_collection_names returns a list so the cursor will have already been fully iterated and therefor an explicit close() call is not required.

NoahStapp · 2025-09-23T19:51:29Z

Good find. The bug sounds a little off. We should always be ending the session once the cursor is fully iterated. And _list_collection_names returns a list so the cursor will have already been fully iterated and therefor an explicit close() call is not required.

Currently, we only end sessions associated with cursors in GC or on explicit closure. The fix here solves the issue of PyPy's GC behaving differently than expected.

Do we want cursor-associated sessions to be either closed or untagged as soon as the cursor is no longer alive? That would remove the need for modifying code for PyPy GC issues, but might add other complexity or bugs.

ShaneHarvey · 2025-09-23T20:05:50Z

I believe this PR has introduced that case as a regression. Cursor.close() is always called after receiving the final batch which will return the session:

mongo-python-driver/pymongo/synchronous/command_cursor.py

Lines 290 to 291 in bbb6f88

    
           if self._id == 0: 
        
               self.close()

ShaneHarvey

One comment otherwise LGTM

ShaneHarvey · 2025-09-23T22:53:24Z

pymongo/asynchronous/client_bulk.py

                    session._start_retryable_write()
                    self.started_retryable_write = True
                session._apply_to(cmd, retryable, ReadPreference.PRIMARY, conn)
+                session._leave_alive = True


Would it make more sense to move this to _process_results_cursor where we actually create a cursor? It seems out of place here.

PYTHON-2390 - Retryable reads use the same implicit session

979208b

NoahStapp requested a review from ShaneHarvey September 17, 2025 18:26

NoahStapp requested a review from a team as a code owner September 17, 2025 18:26

ShaneHarvey reviewed Sep 17, 2025

View reviewed changes

NoahStapp added 6 commits September 17, 2025 16:14

Add other retryable reads to test

382a94c

Merge branch 'master' into PYTHON-2390

d423bc3

Fix test

e79290e

WIP

daf5f84

WIP

d65fe45

Flag implicit sessions used by retryable operations that also use Com…

8026e7a

…mandCursors

NoahStapp marked this pull request as draft September 22, 2025 14:25

Remove debugging

af572bd

NoahStapp marked this pull request as ready for review September 22, 2025 17:05

Fix typing

55aaee3

ShaneHarvey requested changes Sep 22, 2025

View reviewed changes

NoahStapp added 2 commits September 22, 2025 14:23

Remove leave_alive from changestream

9a27445

Make new APIS private

cee6c5a

ShaneHarvey requested changes Sep 22, 2025

View reviewed changes

NoahStapp added 4 commits September 22, 2025 16:45

Remove property decorators

6f5bda7

Try to reproduce on GHA

9479af6

revert workflow changes

d95ce5d

Explicitly close _list_collection_names cursor

f5495ca

Don't check session._leave_alive twice

bfc9a70

NoahStapp requested a review from ShaneHarvey September 23, 2025 21:16

ShaneHarvey approved these changes Sep 23, 2025

View reviewed changes


		set_fail_point(client, fail_command)

		client.t.t.estimated_document_count()

PYTHON-2390 - Retryable reads use the same implicit session #2544

Are you sure you want to change the base?

PYTHON-2390 - Retryable reads use the same implicit session #2544

Conversation

NoahStapp commented Sep 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey commented Sep 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp commented Sep 22, 2025

Uh oh!

NoahStapp commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShaneHarvey commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NoahStapp commented Sep 23, 2025

Uh oh!

ShaneHarvey commented Sep 23, 2025

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NoahStapp commented Sep 22, 2025 •

edited

Loading

NoahStapp commented Sep 23, 2025 •

edited

Loading

ShaneHarvey commented Sep 23, 2025 •

edited

Loading