Skip to content

Conversation

@aclark4life
Copy link
Collaborator

No description provided.

@aclark4life aclark4life changed the title INTPYTHON-752 Integrate pymongo-vectorsearch-utils INTPYTHON-752 Integrate pymongo-search-utils Oct 6, 2025
@aclark4life aclark4life marked this pull request as ready for review October 6, 2025 15:29
@aclark4life aclark4life requested a review from Copilot October 6, 2025 15:30

This comment was marked as resolved.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Oct 6, 2025

@NoahStapp Re: test failures Should driver_info be optional or passed in where it's not currently passed in to append_client_metadata ?

@NoahStapp
Copy link
Collaborator

@NoahStapp Re: test failures Should driver_info be optional or passed in where it's not currently passed in to append_client_metadata ?

It should be passed in. The previous version of append_client_metadata we used here used a DriverInfo set at the package level. The new one from pymongo-search-utils is generic for flexibility so we need to pass in the package-level DriverInfo here.

@aclark4life aclark4life force-pushed the INTPYTHON-752 branch 5 times, most recently from e863f2a to 4f417d6 Compare October 15, 2025 15:09
@blink1073
Copy link
Collaborator

error: The requested interpreter resolved to Python 3.9.24, which is incompatible with the project's Python requirement: >=3.10 (from project.requires-python)

You'll need to update CI to target 3.10 instead of 3.9. 3.13 list is failing due to a ruff format. I'm not sure if the Python 3.13 test failures are caused by INTPYTHON-798 or not.

docs.append(Document(page_content=text, id=oid_to_str(_id), metadata=doc))
return docs

def bulk_embed_and_insert_texts(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot remove public functions of our classes. Instead, wrap those in pymongo-search-utils.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to wrap the method with the function from pymongo-search-utils ? @NoahStapp ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to re-export the pymongo-search-utils bulk_embed_and_insert_texts and other public methods moved there. We do something similar in PyMongo for the synchronous modules: https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py

self._indexes_in_coll_info = indexes_in_collection_info

_append_client_metadata(self._client)
_append_client_metadata(self._client, DRIVER_METADATA)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding DRIVER_METADATA is what _append_client_metadata does. It doesn't have any arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. That said, it's possible that DRIVER_METADATA is the wrong arg.

to_insert = [
{"_id": i, self._text_key: t, **m} for i, t, m in zip(ids, texts, metadatas)
{"_id": i, self._text_key: t, **m}
for i, t, m in zip(ids, texts, metadatas, strict=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a test fail? Removing this check changes behavior considerably.

if ids:
batch_res = self.bulk_embed_and_insert_texts(
texts_batch, metadatas_batch, ids[i : j + 1]
batch_res = bulk_embed_and_insert_texts(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API will change with auto-embedding coming up. It should be insert_texts with perhaps embed as a boolea kwarg.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bulk_embed_and_insert_texts lives in pymongo-search-utils now, we'll update it there once needed.

"langchain-tests==0.3.22",
"pip>=25.0.1",
"typing-extensions>=4.12.2",
"pymongo-search-utils@git+https://github.com/mongodb-labs/pymongo-search-utils.git",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has pymongo-search-utils not been published?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this integration is tested we can publish, according to @blink1073 and @NoahStapp

"I", # isort
]
lint.ignore = ["E501", "B008", "UP007", "UP006", "UP035"]
lint.ignore = ["E501", "B008", "UP007", "UP006", "UP035", "UP038"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe UP038 doesn't exist anymore: https://docs.astral.sh/ruff/rules/non-pep604-isinstance/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants