vector index maintainer #3738

normen662 · 2025-11-11T15:25:24Z

No description provided.

Merge remote-tracking branch 'upstream/main' into vector-index-maintainence

alecgrieser · 2025-11-17T10:06:39Z

fdb-extensions/src/main/java/com/apple/foundationdb/async/hnsw/CompactStorageAdapter.java

                .thenApply(valueBytes -> {
                    if (valueBytes == null) {
-                        throw new IllegalStateException("cannot fetch node");
+                        return null;


It's a bit weird that this is only going to affect the CompactStorageAdapter, especially as it means that we sort of have to audit where we call fetchNodeInternal for possible null checks--but only some of the time.

The other way to do this would be to add a new method like fetchNodeIfExists, and then have this call that one (reintroducing the null check). We could either make the two storage adapters symmetric, or only expose the method on the CompactStorageAdapter. (I'm personally fine with making them symmetrical even if we only ever call this for the purposes of checking for existence on the CompactStorageAdapter).

I did do the audit for null checks. It only affects the CompactStorageAdapter since a non-existing node and a node without any neighbors cannot be differentiated in the inline storage layout (at least not currently).

fdb-extensions/src/main/java/com/apple/foundationdb/async/hnsw/HNSW.java

alecgrieser · 2025-11-17T10:16:25Z

fdb-extensions/src/main/java/com/apple/foundationdb/async/hnsw/HNSW.java

+                        })
+                .thenCompose(accessInfoAndNodeExistence -> {
+                    if (accessInfoAndNodeExistence.isNodeExists()) {
+                        return AsyncUtil.DONE;


If I'm reading this correctly, I believe that this means that if the same primary key is used with multiple vectors, whichever is inserted first will be the one and only one stored in the index for that primary key. There's some niceness to this, as it means that the index can be idempotent (as inserting the same vector + primary key multiple times will result in exactly one index insert), but it also means that if there's a bug and we re-use primary keys, we'll get this weird result. I think we'd discussed at one point trying to detect if the newVector matches the existing vector. (I guess that that information actually isn't in the index if, somehow, we used a compact encoding on level 0. Which we currently ban, but you know, we could theoretically do at one point.)

In any case, it would probably be a good idea to add tests of what happens if the same primary key is used for multiple inserts

Added testing to ensure the index behaves in an idempotent way. I suppose we can change the behavior if needed.

alecgrieser · 2025-11-17T10:19:48Z

fdb-extensions/src/main/java/com/apple/foundationdb/async/hnsw/Config.java

                   final int statsThreshold, final boolean useRaBitQ, final int raBitQNumExBits,
                   final int maxNumConcurrentNodeFetches, final int maxNumConcurrentNeighborhoodFetches) {
+        Preconditions.checkArgument(m <= mMax);
+        Preconditions.checkArgument(mMax <= mMax0);


Hm. Maybe it's worth asserting that mMax <= mMax0. If I'm understanding those parameters correctly, we want a larger mMax0 to increase the connectivity of the bottom layer. In theory, that could be less than mMax, it would just be maladapted in that level zero would now be unhelpfully less connected than upper levels. Maybe warning the user to also adjust mMax0 makes sense. I could also see the argument that we just want to set this.mMax0 = Math.max(mMax, mMax0). Maybe that makes more sense in the Builder than the object constructor.

I would say that mMax0 should be prevented from being smaller than mMax. Given that under normal circumstances it is usually mMax0 = 2*mMax per default I would say that if by any chance mMax0 is smaller than mMax I would consider that an error.

fdb-extensions/src/main/java/com/apple/foundationdb/linear/RealVector.java

...st/java/com/apple/foundationdb/record/provider/foundationdb/indexes/VectorIndexTestBase.java

alecgrieser · 2025-11-17T18:09:16Z

...c/test/java/com/apple/foundationdb/record/provider/foundationdb/indexes/VectorIndexTest.java

+                VectorRecord.Builder recordBuilder =
+                        VectorRecord.newBuilder();
+                recordBuilder.mergeFrom(loadedRecord.getRecord());
+                Assertions.assertThat(loadedRecord.getRecord()).isEqualTo(savedRecords.get((int)l).getRecord());


What is this intended to test? That having a vector index doesn't disrupt the read/write path in some way that causes errors unrelated to querying?

Yeah, in a way. It makes sure that the vectors are properly indexed and read back when multiple vector indexes are present. It also makes sure (although this currently cannot happen at all), that the vector we see in the record on load definitely is the vector that is stored with the record and not the potentially encoded vector (RaBitQ) from the index entries. I think this test will make more sense later when we have covering and non-covering scans. We don't want the RaBitQ vectors to leak out here. I agree, though, that the use case for this test is somewhat limited right now. I think I took it from the POC and modified it a bit without scrutinizing it too much.

alecgrieser · 2025-11-17T18:13:09Z

...c/test/java/com/apple/foundationdb/record/provider/foundationdb/indexes/VectorIndexTest.java

+                            VectorRecord.newBuilder()
+                                    .mergeFrom(Objects.requireNonNull(rec).getRecord())
+                                    .build();
+                    allCounters[record.getGroupId()] ++;


Should this assert that the group id is in {0, 1}?

I think asserts should assert the non-test code and not the test case itself. I would consider the case you pointed out to be a bug in the test case itself and I am fine for it to fail with an out-of-bound exception.

...st/java/com/apple/foundationdb/record/provider/foundationdb/indexes/VectorIndexTestBase.java

Merge remote-tracking branch 'upstream/main' into vector-index-maintainence

normen662 changed the title ~~initial version from poc~~ vector index maintainer Nov 13, 2025

normen662 added 2 commits November 13, 2025 20:01

initial version from poc

51f2225

various fixes

7f1d54e

normen662 force-pushed the vector-index-maintainence branch from 71cc760 to 7f1d54e Compare November 13, 2025 19:01

normen662 added the enhancement New feature or request label Nov 13, 2025

normen662 force-pushed the vector-index-maintainence branch 6 times, most recently from bdcc506 to aa1d6d4 Compare November 19, 2025 09:18

adding vector index scan options and more tests

15e7df6

normen662 force-pushed the vector-index-maintainence branch from aa1d6d4 to 15e7df6 Compare November 19, 2025 10:57

normen662 added 2 commits November 19, 2025 12:21

merging main

5112427

Merge remote-tracking branch 'upstream/main' into vector-index-maintainence

improve tests

f384b4c

alecgrieser requested changes Nov 19, 2025

View reviewed changes

normen662 added 6 commits November 19, 2025 18:27

addressing some comments

d8ab84f

addressing more comments

ce325ed

addressing more comments

3342d90

addressing more comments

901329d

merging upstream/main

6ae405a

Merge remote-tracking branch 'upstream/main' into vector-index-maintainence

addressing more comments

1f9a870

vector index maintainer #3738

Are you sure you want to change the base?

vector index maintainer #3738

Uh oh!

Conversation

normen662 commented Nov 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

normen662 Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

normen662 Nov 19, 2025 •

edited

Loading