Skip to content

Conversation

NoahStapp
Copy link
Contributor

No description provided.

@NoahStapp NoahStapp requested a review from a team as a code owner September 17, 2025 18:26
retryReads=True,
)

set_fail_point(client, fail_command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use self.fail_point() here.


set_fail_point(client, fail_command)

client.t.t.estimated_document_count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend this test to cover a few other operations as well?

if event.command_name == "count"
]
self.assertEqual(len(lsids), 2)
self.assertEqual(lsids[0], lsids[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fix PYTHON-2391 first? Otherwise this test doesn't prove the fix works correctly since first.command is the same dict instance as second.command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should merge the PR for PYTHON-2391 (#2545) first to verify this works correctly.

@NoahStapp NoahStapp marked this pull request as draft September 22, 2025 14:25
@NoahStapp NoahStapp marked this pull request as ready for review September 22, 2025 17:05
@NoahStapp
Copy link
Contributor Author

NoahStapp commented Sep 22, 2025

Some additional context:

Our current code uses explicit_session in two different ways that have the same result (don't close this session after the cursor that uses it is done):

  1. Actual explicit sessions passed by the user. This is the "intended" behavior and is consistent with the parameter name. However, the same functionality can be achieved with better separation of concerns by using the existing ClientSession.implicit property.
  2. Implicit sessions that are used across multiple sub-operations within a single user-level operation. For example, create_collection first calls list_collection_names if supported, using the same implicit session for both operations. list_collection_names uses a CommandCursor, which by default closes any implicit session passed to it when done. To avoid this behavior, we pretended that the implicit session was explicit. This pattern exists in multiple places and has been replaced by the new ClientSession.leave_alive property for the same purpose.


@leave_alive.setter
def leave_alive(self, value: bool) -> None:
self._leave_alive = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these new apis should be public since implicit sessions are only used internally by pymongo.

with self._client._tmp_session(self._session) as s:
if s:
s.leave_alive = True
return self._run_aggregation_cmd(session=s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't leaving the cursor alive here leak a session every time the change stream cursor is closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This shouldn't have leave_alive set.

return self._implicit

@property
def attached_to_cursor(self) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two attributes to track ownership?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific? As in we need a second attribute to track a different axis of ownership?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh I was thinking tmp_session would work via the existing implicit=True/False attribute + a new "owner" attribute but your attached_to_cursor+leave_alive implementation seems simpler.

@ShaneHarvey
Copy link
Member

Looks like there's one test failure to fix:

__________________________ TestSession.test_database ___________________________

self = <test.asynchronous.test_session.TestSession testMethod=test_database>

    async def asyncTearDown(self):
        monitoring._SENSITIVE_COMMANDS.update(self.sensitive_commands)
        await self.client.drop_database("pymongo_test")
        used_lsids = self.initial_lsids.copy()
        for event in self.session_checker_listener.started_events:
            if "lsid" in event.command:
                used_lsids.add(event.command["lsid"]["id"])
    
        current_lsids = {s["id"] for s in session_ids(self.client)}
>       self.assertLessEqual(used_lsids, current_lsids)
E       AssertionError: {Binary(b'\x92\xf1~\xfa\\\xbdI\xed\x96m\xba\x12\xdd\xba\x92\xbf', 4), Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)} not less than or equal to {Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)}

test/asynchronous/test_session.py:116: AssertionError

@property
def _is_implicit(self) -> bool:
"""Whether this session was implicitly created by the driver."""
return self._implicit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is personal preference but do we really need these @property helpers? Usually we just access the private attribute directly, eg:

if session._implicit:...
if session._attached_to_cursor:...

This way there's less indirection and boilerplate code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internal attributes it makes more sense to not have @property, agreed.

@NoahStapp
Copy link
Contributor Author

Looks like there's one test failure to fix:

__________________________ TestSession.test_database ___________________________

self = <test.asynchronous.test_session.TestSession testMethod=test_database>

    async def asyncTearDown(self):
        monitoring._SENSITIVE_COMMANDS.update(self.sensitive_commands)
        await self.client.drop_database("pymongo_test")
        used_lsids = self.initial_lsids.copy()
        for event in self.session_checker_listener.started_events:
            if "lsid" in event.command:
                used_lsids.add(event.command["lsid"]["id"])
    
        current_lsids = {s["id"] for s in session_ids(self.client)}
>       self.assertLessEqual(used_lsids, current_lsids)
E       AssertionError: {Binary(b'\x92\xf1~\xfa\\\xbdI\xed\x96m\xba\x12\xdd\xba\x92\xbf', 4), Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)} not less than or equal to {Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)}

test/asynchronous/test_session.py:116: AssertionError

But only on PyPy 🫠

@NoahStapp
Copy link
Contributor Author

NoahStapp commented Sep 23, 2025

Looks like there's one test failure to fix:

__________________________ TestSession.test_database ___________________________

self = <test.asynchronous.test_session.TestSession testMethod=test_database>

    async def asyncTearDown(self):
        monitoring._SENSITIVE_COMMANDS.update(self.sensitive_commands)
        await self.client.drop_database("pymongo_test")
        used_lsids = self.initial_lsids.copy()
        for event in self.session_checker_listener.started_events:
            if "lsid" in event.command:
                used_lsids.add(event.command["lsid"]["id"])
    
        current_lsids = {s["id"] for s in session_ids(self.client)}
>       self.assertLessEqual(used_lsids, current_lsids)
E       AssertionError: {Binary(b'\x92\xf1~\xfa\\\xbdI\xed\x96m\xba\x12\xdd\xba\x92\xbf', 4), Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)} not less than or equal to {Binary(b'\xbd\x86\xc1\xbc\xd35L\xb6\x80"\xbc\xb1D\x0f:a', 4)}

test/asynchronous/test_session.py:116: AssertionError

The failure was caused by PyPy garbage collecting differently than CPython: CPython GC'd the CommandCursor created by _list_collection_names earlier than PyPy does. Since the session._attached_to_cursor flag is only unset when a cursor is closed, this caused the PyPy test to fail due to the session still being attached at test teardown.

I've fixed this by explicitly closing the cursor rather than relying on garbage collection. Should we make doing so a standard pattern in the codebase?

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Sep 23, 2025

Good find. The bug sounds a little off. We should always be ending the session once the cursor is fully iterated. And _list_collection_names returns a list so the cursor will have already been fully iterated and therefor an explicit close() call is not required.

@NoahStapp
Copy link
Contributor Author

Good find. The bug sounds a little off. We should always be ending the session once the cursor is fully iterated. And _list_collection_names returns a list so the cursor will have already been fully iterated and therefor an explicit close() call is not required.

Currently, we only end sessions associated with cursors in GC or on explicit closure. The fix here solves the issue of PyPy's GC behaving differently than expected.

Do we want cursor-associated sessions to be either closed or untagged as soon as the cursor is no longer alive? That would remove the need for modifying code for PyPy GC issues, but might add other complexity or bugs.

@ShaneHarvey
Copy link
Member

I believe this PR has introduced that case as a regression. Cursor.close() is always called after receiving the final batch which will return the session:

if self._id == 0:
self.close()

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment otherwise LGTM

session._start_retryable_write()
self.started_retryable_write = True
session._apply_to(cmd, retryable, ReadPreference.PRIMARY, conn)
session._leave_alive = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to move this to _process_results_cursor where we actually create a cursor? It seems out of place here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants