Skip to content

Conversation

@cpjulia
Copy link
Contributor

@cpjulia cpjulia commented Nov 25, 2025

Scope & Purpose

In AqlValue.cpp, there's a hashing function and a comparator. An AqlValue object can have different types of values, inline, slice pointers, managed slices and managed strings (and, in the future, supervised slice). Inline values are non-allocated values that are distributed within the AqlValue's 16 bytes of available space, slice pointers are pointers to values not owned by the AqlValue object, and managed slices and strings are dynamically allocated payloads owned by the AqlValue object. Even if multiple aql values point to the same managed slice, still, it's owned by AqlValue, and the last AqlValue that points to it to be destroy has to free this allocated memory. The hashing and comparator are used in AqlItemBlock for deduplication, when inserting the AqlValue on a block, so it has to compare if two aql values are equal.

Now onto the hashing, before, the hash was made with the pointers, not the data pointed to by the pointers. The data that the pointer stores is the address of the payload it points to, hence, if two aql values that own memory were compared using this approach, they would never be considered equal, and they could be, because, even though having memory allocated in different addresses, the payload that each store could be semantically the same. This would lead to the comparison used in AqlItemBlock not working properly, and wrongfully distinguishing values that should be considered the same in practice.
The new approach is to hash by the value stored in the address in cases where there's dynamic allocation.
When the value is inline, it's safe to maintain the original approach.
When the value is of type slice pointer, it means AqlValue doesn't own it, hence, the pointers should be hashed. If they point to the same address, then they could be considered the same.

  • 💩 Bugfix
  • 🍕 New feature
  • 🔥 Performance improvement
  • 🔨 Refactoring/simplification

Checklist

  • Tests -> PROBABLY TO BE ADDED
    • Regression tests
    • C++ Unit tests
    • integration tests
    • resilience tests
  • 📖 CHANGELOG entry made
  • 📚 documentation written (release notes, API changes, ...)
  • Backports
    • Backport for 3.12.0: (Please link PR)
    • Backport for 3.11: (Please link PR)
    • Backport for 3.10: (Please link PR)

Related Information

(Please reference tickets / specification / other PRs etc)

  • Docs PR:
  • Enterprise PR:
  • GitHub issue / Jira ticket:
  • Design document:

Note

Normalize AqlValue hashing and equality to be value/content-based (independent of storage/pointers), with dedicated handling for ranges and extensive unit tests validating deduplication behavior.

  • Core (AqlValue):
    • std::hash<AqlValue> now uses AqlValue::hash(0) (normalized/content-based) and maps zero to 1.
    • std::equal_to<AqlValue> reworked to compare by content via VelocyPackHelper::equal(...), with fast-path for inline long numbers and structural compare for RANGE (_low/_high).
  • Tests:
    • Add comprehensive suites validating hash/equality correctness and deduplication behavior in AqlItemBlock::toVelocyPack and InputAqlItemRow::cloneToBlock:
      • tests/Aql/AqlValueHashAlgorithmCorrectnessTest.cpp
      • tests/Aql/AqlValueHashEqualTest.cpp
      • tests/Aql/AqlValueHashTest.cpp
  • Build:
    • Register new tests in tests/CMakeLists.txt.

Written by Cursor Bugbot for commit 8d4f533. This will update automatically on new commits. Configure here.

@cla-bot cla-bot bot added the cla-signed label Nov 25, 2025
@cpjulia cpjulia added this to the devel milestone Nov 25, 2025
Copy link
Member

@mchacki mchacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some clarifying questions

return std::hash<void const*>()(x._data.rangeMeta.range);
using T = AqlValue::AqlValueType;
auto aqlValueType = x.type();
// as this is non owning, we hash by the pointer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the owning make for a difference?

Hashing is a quick look of the content is equal.
In my opinion it should not take into account if the value is owned or not.

Comment on lines 1383 to 1386
if (h == 0) { // fallback to avoid collision with the marker that uses h ==
// 0, very unlikely to happen
h = 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hö what is this?

Comment on lines 1407 to 1408
return a._data.slicePointerMeta.pointer ==
b._data.slicePointerMeta.pointer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this pointer comparison?
And not real equality comparison like the default case?

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

// Different allocations, different pointers -> different hashes
// This is CORRECT: managed slices compare by pointer, so different pointers = different values
EXPECT_NE(hasher(v1), hasher(v2));
EXPECT_FALSE(equal(v1, v2)); // Pointer comparison
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Tests expect pointer comparison but implementation uses value comparison

The test AqlValueHash_ManagedSlices expects EXPECT_NE(hasher(v1), hasher(v2)) and EXPECT_FALSE(equal(v1, v2)) for managed slices with identical content. However, the new equal_to implementation uses VPackNormalizedCompare::equals() for VPACK_MANAGED_SLICE types (content comparison), and the hash uses x.hash(0) which produces content-based hashes. Two values with the same content "same content" will have equal hashes and compare equal, causing these assertions to fail. The same issue affects AqlValueHash_RangeValues and the RANGE section of AqlValueHash_VerifyPointerVsPayload_Hashing, where the implementation compares _low and _high values rather than pointers, making r1(1,10) and r2(1,10) equal with the same hash.

Additional Locations (2)

Fix in Cursor Fix in Web

<< "Supervised and managed slices with same content are equal";

supervised.destroy();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing destroy call for managed value causes memory leak

The test AqlValueHash_SupervisedVsManaged_ContentEqual creates two managed AqlValue objects (supervised and managed) using the same constructor pattern, but only calls supervised.destroy() at line 211. The managed value is never destroyed, causing a memory leak. Other tests in this file consistently destroy all managed values they create. The test would fail its implicit memory tracking checks if they were added.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants