-
Notifications
You must be signed in to change notification settings - Fork 34
INTPYTHON-752 Integrate pymongo-search-utils #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@NoahStapp Re: test failures Should |
It should be passed in. The previous version of |
1941706 to
b75d316
Compare
e863f2a to
4f417d6
Compare
You'll need to update CI to target 3.10 instead of 3.9. 3.13 list is failing due to a ruff format. I'm not sure if the Python 3.13 test failures are caused by INTPYTHON-798 or not. |
4f417d6 to
010f9a7
Compare
| docs.append(Document(page_content=text, id=oid_to_str(_id), metadata=doc)) | ||
| return docs | ||
|
|
||
| def bulk_embed_and_insert_texts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot remove public functions of our classes. Instead, wrap those in pymongo-search-utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to wrap the method with the function from pymongo-search-utils ? @NoahStapp ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to re-export the pymongo-search-utils bulk_embed_and_insert_texts and other public methods moved there. We do something similar in PyMongo for the synchronous modules: https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py
| self._indexes_in_coll_info = indexes_in_collection_info | ||
|
|
||
| _append_client_metadata(self._client) | ||
| _append_client_metadata(self._client, DRIVER_METADATA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding DRIVER_METADATA is what _append_client_metadata does. It doesn't have any arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. That said, it's possible that DRIVER_METADATA is the wrong arg.
| to_insert = [ | ||
| {"_id": i, self._text_key: t, **m} for i, t, m in zip(ids, texts, metadatas) | ||
| {"_id": i, self._text_key: t, **m} | ||
| for i, t, m in zip(ids, texts, metadatas, strict=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a test fail? Removing this check changes behavior considerably.
| if ids: | ||
| batch_res = self.bulk_embed_and_insert_texts( | ||
| texts_batch, metadatas_batch, ids[i : j + 1] | ||
| batch_res = bulk_embed_and_insert_texts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API will change with auto-embedding coming up. It should be insert_texts with perhaps embed as a boolea kwarg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bulk_embed_and_insert_texts lives in pymongo-search-utils now, we'll update it there once needed.
| "langchain-tests==0.3.22", | ||
| "pip>=25.0.1", | ||
| "typing-extensions>=4.12.2", | ||
| "pymongo-search-utils@git+https://github.com/mongodb-labs/pymongo-search-utils.git", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has pymongo-search-utils not been published?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this integration is tested we can publish, according to @blink1073 and @NoahStapp
| "I", # isort | ||
| ] | ||
| lint.ignore = ["E501", "B008", "UP007", "UP006", "UP035"] | ||
| lint.ignore = ["E501", "B008", "UP007", "UP006", "UP035", "UP038"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe UP038 doesn't exist anymore: https://docs.astral.sh/ruff/rules/non-pep604-isinstance/
No description provided.