Skip to content

feat: UUID (and bson.Binary of UUID subtype) is accepted as a document _id field value#40

Open
Guibod wants to merge 1 commit intoscottrogowski:masterfrom
Guibod:feat-binary-ids
Open

feat: UUID (and bson.Binary of UUID subtype) is accepted as a document _id field value#40
Guibod wants to merge 1 commit intoscottrogowski:masterfrom
Guibod:feat-binary-ids

Conversation

@Guibod
Copy link
Contributor

@Guibod Guibod commented Mar 12, 2023

As stated in #41 , anything can be used as an _id in mongo, in this PR, i’ve lifted the validation limitations on bson.Binary as a proper data type for _id types. This can then be used to store UUID values.

UUID and bson.Binary with subtypes described in bson.binary.UuidRepresentation are two common ways of describing UUID in pymongo. UUID is the python type and bson.Binary is the storage type.

I introduce a new method mongita.engines.engine_common.Engine.hash_id that transforms an eligible type to string. It replace the str(doc_id) found in numerous places. It will support the transformation from bson.Binary to string using bson.binary.Binary.as_uuid.

I’ve also needed to put a very specific comparison in mongita.collection._doc_matches_slow_filters in case of a user inputed bson.Binary and a decoded binary bybson.decode that is cast as a UUID.

@Guibod Guibod changed the title feat: bson.Binary is accepted as a document _id field value feat: UUID (and bson.Binary of UUID subtype) is accepted as a document _id field value Mar 12, 2023
@scottrogowski
Copy link
Owner

Thanks for the PR!

Looks plausible. Will take a little while to review. I'm working on a SQLite implementation right now and luckily this looks like it would slide right in.

@scottrogowski
Copy link
Owner

@Guibod looks like this is failing unit tests

make test

ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information.

@Guibod
Copy link
Contributor Author

Guibod commented Apr 3, 2023

Thanks you @scottrogowski , i’ll try to check this as soon as possible. ❤️

@Guibod Guibod force-pushed the feat-binary-ids branch from add9d1d to e0a1c57 Compare April 3, 2023 07:13
@Guibod
Copy link
Contributor Author

Guibod commented Apr 3, 2023

Well, this issue was bound to the absence of uuidRepresentation in mongoengine tests.

for instance:

mongoengine.connect('mongita_test_db')

Was replaced by:

mongoengine.connect('mongita_test_db', uuidRepresentation=UuidRepresentation.STANDARD)

I hesitated at first to add a note in the README.md, but in the end mongoengine related tests only validates the behavior of a mongita dependency.

You are all set and can review the PR once more !

@Guibod
Copy link
Contributor Author

Guibod commented Apr 3, 2023

And of course, i failed. I’ve only solved the warnings in my previous commit, and those warnings were bound to pymongo 3. 😅
This time, that’s pymongo 4 (from #45 ) that is much more watchful regarding the usage of UUID.

This is related to the fact that there was legacy implementation of the UUID in BSON between Java, Python and other. And that those legacy behaviors were unified under a new standard name. If you don’t specify your uuid_representation you run a risk of saving incompatible data. Until recently, it fell back automatically to legacy python implementation.

In my PR, i’ll need either to setup a default uuid_representation for the mongita disk client, or to enforce the type in each UUID we generate.

@Guibod Guibod force-pushed the feat-binary-ids branch from e0a1c57 to 94109bb Compare April 3, 2023 09:23
@Guibod
Copy link
Contributor Author

Guibod commented Apr 3, 2023

The solution is to enhance the DiskEngine to add a bson.CodecOptions instance, and use it in each bson.encode and bson.decode instead of factory settings that don’t decide with UUID storage protocol to use.

My PR decide that mongita will only handle UUID using the new standard, and cannot be configured to any other value. It might be an issue for pre-existing database were the uuid was stored using legacy format (python), were they should change this codec option to UuidRepresentation.PYTHON_LEGACY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants