INTPYTHON-749: Fix bug in null matching on queryoptimizer #401

Jibola · 2025-09-11T17:26:11Z

Summary

We recently released the #373, which converts $expr binary operators to their direct $match equivalents. However, there exists subtle differences between $expr binary operators and $match binary operators. Handling of null values (e.g. {"field": null} vs {"$expr": { "$eq" : ["field", null]}}) differ because in find() operations null can also mean the "lack of a fields existence" whereas in $expr it means the field must also exist.
As a consequence, to convert from express binary operator to a match operator, we need to include $exists: False in our operations.

Changes in this PR

NULL checks have to explicitly check for both field existence and equality. As such, anytime a NoneType is used a lookup, the optimization will wrap the query query with an {$and: [ {$exists: false}, ...]} to maintain $expr logical expectations.
Added test cases to validate the null-wrapping behavior occurs

Screenshots (grabbed text)

# Before optimizations
>>> NullableJSONModel.objects.filter(value=None)
{
                "$match": {
                        "$expr": { "$eq" : [ "$value", null ] }
                }
            }

# With optimizations but before fix
>>> NullableJSONModel.objects.filter(value=None)
{
                "$match": {
                        "value": null          # Not equivalent queries
                }
            }

# After 
>>> NullableJSONModel.objects.filter(value=None)
{
                "$match": {
                    "$and": [
                        {
                            "value": {
                                  "$exists": true
                             }
                        }
                        {
                            "value": null
                        },
                    ]
                }
            }

Callouts

This case only occurs in on JSONField queries. I'm only adding the specific test case for that in this PR. Adding newer test cases that validate against using the Model object will be set for a separate PR. (INTPYTHON-750: Support converting $expr with $getField paths #392 being one of them)

timgraham · 2025-09-12T00:04:54Z

tests/expression_converter_/models.py

+from django.db import models
+
+
+class NullableJSONModel(models.Model):
+    value = models.JSONField(blank=True, null=True)
+
+    class Meta:
+        required_db_features = {"supports_json_field"}
+
+
+class Tag(models.Model):
+    name = models.CharField(max_length=10)
+
+    def __str__(self):
+        return self.name


Please try to use existing test apps and existing tests. The expression_converter_ app will be removed when query generation is refactored. In the case of JSONField, since Django's tests/model_fields/test_jsonfield.py doesn't seem to catch the issue, we can add a file of the same name to this project's tests/model_fields_.

My pushback here is that it they're placed here since we're testing a direct consequence of the QueryOptimizer and a bugfix to the QueryOptimizer; the original code has no bug. From my perspective the query refactor is larger lift and it makes more sense there to then fold in models -- especially if this will be temporary.

The tests are organized by behavior, not by what part of the code caused the problem. Presumably we want any future query generation algorithm to generate the same query, or at least a query that behaves the same. From my perspective, there is no advantage to regression tests like this in expression_converter_, as it creates more work down the line to move them elsewhere. Looking at this again, tests/lookup_ might be more appropriate than model_fields_. InConverter is also modified but I don't see a regression test for that.

Sure. I'll also include a test for $in as well since it doesn't behave the same way and also move them to lookup_.

timgraham · 2025-09-12T00:07:57Z

tests/expression_converter_/test_filter_conversion.py

+class MQLTests(MongoTestCaseMixin, TestCase):
+    def test_none_filter_nullable_json(self):
+        with self.assertNumQueries(1) as ctx:
+            list(NullableJSONModel.objects.filter(value=None))


Please also add the test data and assert the query results. It's unclear to me why existing tests don't catch the problem unless, in the case of Tag, this is about polymorphic data where the "name" field doesn't exist in the data.

As for JSONField, perhaps it's separate from this regression (if there is indeed one), but there are limitations about null/nonexistent queries.

Sure I can add data.

I'll also say JSONField isn't separate and was the main way I found the issue. (I'll add the details to the PR description), but it's specifically that the un-optimized JSONFIeld call would generate this:

{ $match: { $expr: { $eq: [ '$field', null] } } }

This explicitly checks for the presence of a field + the field being null.

The optimization made it:

{ $match: { field: null } }

Which checks for null OR the field as undefined

So it's the same regression simply because we've broken the underlying expectation same as before.

Honestly, in cases outside of JSONField the data integrity remains the same because we still enforce the $exists: false check redundantly.

For the case of Tag, we cant add polymorphic tests yet because testing Polymorphic data won't yield any results since polymorphic data requires querying an embedded field. That triggers a $getField which doesn't get converted in this PR. It gets converted in #392 . So the test will need to wait til that change.

Caveat: To add data that would render some worthwhile test results, required injecting data outside of the ORM -- since the only way to get to the current case (outside of $getField) stems from someone injecting information outside of the bounds of the ORM CRUD operations.

timgraham · 2025-09-13T00:30:00Z

tests/lookup_/tests.py

+    def _test_with_raw_data(self, model, test_function, field):
+        collection = connection.database.get_collection(model._meta.db_table)
+        try:
+            collection.insert_one({"_id": self.unique_id})


We need to decide to what extent we'll support "sparse" data with missing columns (another example: #275). We have only a few tests with this sort of data, so the current status is basically "ad-hoc, best effort, react to bug reports." I feel we should decide on an official policy and document it so that users have clear expectations. If we move forward with supporting this, we should at least have these tests omit optional fields, so that we don't have the test assertions fail to generate the usual error message (in this case, Book.__str__() fails because title is None.

====================================================================== ERROR: test_none_filter_binary_operator (lookup_.tests.NullValueLookupTests.test_none_filter_binary_operator) (op='exact') ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tim/code/django-mongodb/tests/lookup_/tests.py", line 105, in _test_with_raw_data test_function(op, predicate, field) File "/home/tim/code/django-mongodb/tests/lookup_/tests.py", line 80, in _test_none_filter_binary_operator self.assertQuerySetEqual( File "/home/tim/code/django/django/test/testcases.py", line 1290, in assertQuerySetEqual return self.assertEqual(list(items), values, msg=msg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/unittest/case.py", line 885, in assertEqual assertion_func(first, second, msg=msg) File "/usr/lib/python3.12/unittest/case.py", line 1091, in assertListEqual self.assertSequenceEqual(list1, list2, msg, seq_type=list) File "/usr/lib/python3.12/unittest/case.py", line 1068, in assertSequenceEqual difflib.ndiff(pprint.pformat(seq1).splitlines(), ^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 62, in pformat underscore_numbers=underscore_numbers).pformat(object) ^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 161, in pformat self._format(object, sio, 0, 0, {}, 0) File "/usr/lib/python3.12/pprint.py", line 178, in _format rep = self._repr(object, context, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 458, in _repr repr, readable, recursive = self.format(object, context.copy(), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 471, in format return self._safe_repr(object, context, maxlevels, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 622, in _safe_repr orepr, oreadable, orecur = self.format( ^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 471, in format return self._safe_repr(object, context, maxlevels, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/pprint.py", line 632, in _safe_repr rep = repr(object) ^^^^^^^^^^^^ File "/home/tim/code/django/django/db/models/base.py", line 590, in __repr__ return "<%s: %s>" % (self.__class__.__name__, self) ^^^^ TypeError: __str__ returned non-string (type NoneType)

I'm actually of the mind -- for this case specifically -- that we do not include this test. In order to reach this case, you have to do ORM-breaking behavior that we don't have a firm policy on. The real "bugs" for this case stem from optimizing embedded documents in a flow that only leverages the Django ORM.

Like you've also just stated we don't have a clear "limit" on what is a valid experience. Let's loop in product and get their take.

timgraham · 2025-09-13T00:32:07Z

tests/lookup_/tests.py

+    def test_none_filter_nullable_json(self):
+        self._test_with_raw_data(NullableJSONModel, self._test_none_filter_nullable_json, "value")


This style of writing tests with generic methods calling other generic methods doesn't make my heart sing. It seems extremely tedious to understand and confirm that the test actually does what it's designed to do. I saw some of this in the converter patch and let it slide there, but I think it's best to avoid going forward. There can be a role for helper functions and assertions, but when there's special casing like **{f"{field}__{op}": [None] if op == "in" else None}, I think it's time to have separate tests, even if it's a bit more repetitive. I find even subTest fails sometimes very convoluted to debug.

I can agree I don't need to shoe-horn "in" into the test, so I don't mind separating them. I will say, though, it is a matter of what makes the most sense. If readability is the issue then that's more of a lack of explicit documentation on authorship.

For context on this; I originally had more operators, but now that it went back down to 2, it's less valuable. In the case where the same rote operation became needlessly repetitious, I do think it still holds value.

timgraham · 2025-09-13T00:33:39Z

tests/lookup_/models.py

+    class Meta:
+        required_db_features = {"supports_json_field"}


This project's tests aren't meant to be run by other databases, so this isn't needed.

timgraham · 2025-09-13T00:35:59Z

tests/lookup_/tests.py

+    def _test_with_raw_data(self, model, test_function, field):
+        collection = connection.database.get_collection(model._meta.db_table)
+        try:
+            collection.insert_one({"_id": self.unique_id})


It seems to be this data could be inserted in setupTestData. Django automatically empties the collection after each test, so manual cleanup isn't needed.

Ah, that was my misunderstanding. I assumed setUpTestData was for the entire class.

Fix bug in null matching on queryoptimizer

6f2052a

Jibola marked this pull request as ready for review September 11, 2025 20:34

Jibola requested review from aclark4life, timgraham and WaVEV September 11, 2025 20:46

include none tests for objects.filter

139d0b4

timgraham reviewed Sep 12, 2025

View reviewed changes

Jibola added 4 commits September 12, 2025 13:57

migrated tests to lookup_ and added data

fafe479

call filter once in each test

50177bc

include missing field specifier

e8011b9

fix field resolution and value lookup

0ca2a5e

timgraham reviewed Sep 13, 2025

View reviewed changes

Jibola added 3 commits September 15, 2025 11:40

refactored the tests to remove generics

4af823a

remove the CONSTANT dictionary

0161dfb

update tests to support ordering and have a default for self.title

0b29263

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

INTPYTHON-749: Fix bug in null matching on queryoptimizer #401

INTPYTHON-749: Fix bug in null matching on queryoptimizer #401

Jibola commented Sep 11, 2025 •

edited

Loading

Uh oh!

timgraham Sep 12, 2025

Uh oh!

Jibola Sep 12, 2025

Uh oh!

timgraham Sep 12, 2025

Uh oh!

Jibola Sep 12, 2025

Uh oh!

timgraham Sep 12, 2025

Uh oh!

Jibola Sep 12, 2025 •

edited

Loading

Uh oh!

Jibola Sep 12, 2025

Uh oh!

Jibola Sep 12, 2025

Uh oh!

timgraham Sep 13, 2025

Uh oh!

Jibola Sep 15, 2025 •

edited

Loading

Uh oh!

timgraham Sep 13, 2025

Uh oh!

Jibola Sep 15, 2025 •

edited

Loading

Uh oh!

timgraham Sep 13, 2025

Uh oh!

timgraham Sep 13, 2025

Uh oh!

Jibola Sep 15, 2025

Uh oh!

Uh oh!

		def test_none_filter_nullable_json(self):
		self._test_with_raw_data(NullableJSONModel, self._test_none_filter_nullable_json, "value")

INTPYTHON-749: Fix bug in null matching on queryoptimizer #401

Are you sure you want to change the base?

INTPYTHON-749: Fix bug in null matching on queryoptimizer #401

Conversation

Jibola commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes in this PR

Screenshots (grabbed text)

Callouts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jibola Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jibola Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jibola Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jibola commented Sep 11, 2025 •

edited

Loading

Jibola Sep 12, 2025 •

edited

Loading

Jibola Sep 15, 2025 •

edited

Loading

Jibola Sep 15, 2025 •

edited

Loading