Skip to content

Conversation

@azevaykin
Copy link
Collaborator

Changelog entry

Add optimization rule KqpApplyVectorTopKToReadTable that pushes down
ORDER BY Knn::*Distance/Similarity(...) LIMIT N queries to datashards,
enabling brute force vector search without requiring a vector index.

Changelog category

  • Performance improvement

Description for reviewers

...

Copilot AI review requested due to automatic review settings November 26, 2025 17:09
@azevaykin azevaykin requested review from a team as code owners November 26, 2025 17:09
@github-actions
Copy link

github-actions bot commented Nov 26, 2025

2025-11-26 17:10:12 UTC Pre-commit check linux-x86_64-release-asan for 30c416d has started.
2025-11-26 17:10:52 UTC Artifacts will be uploaded here
2025-11-26 17:13:08 UTC ya make is running...
🟡 2025-11-26 19:07:48 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
13478 13404 0 54 11 9

🟢 2025-11-26 19:07:56 UTC Build successful.
🟡 2025-11-26 19:08:21 UTC ydbd size 3.8 GiB changed* by +999.3 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 37ffbd0 merge: 30c416d diff diff %
ydbd size 4 111 666 864 Bytes 4 112 690 192 Bytes +999.3 KiB +0.025%
ydbd stripped size 1 528 107 248 Bytes 1 528 327 600 Bytes +215.2 KiB +0.014%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 26, 2025

🟢 2025-11-27 06:07:29 UTC The validation of the Pull Request description is successful.

@github-actions
Copy link

github-actions bot commented Nov 26, 2025

2025-11-26 17:13:01 UTC Pre-commit check linux-x86_64-relwithdebinfo for 30c416d has started.
2025-11-26 17:13:17 UTC Artifacts will be uploaded here
2025-11-26 17:15:27 UTC ya make is running...
🟡 2025-11-26 19:28:34 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41638 38754 0 31 2836 17

2025-11-26 19:28:46 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-11-26 19:41:06 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
90 (only retried tests) 46 0 30 0 14

2025-11-26 19:41:12 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-11-26 19:50:09 UTC Some tests failed, follow the links below.

Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
86 (only retried tests) 44 0 30 0 12

🟢 2025-11-26 19:50:16 UTC Build successful.
🟡 2025-11-26 19:50:38 UTC ydbd size 2.3 GiB changed* by +324.7 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 37ffbd0 merge: 30c416d diff diff %
ydbd size 2 456 394 840 Bytes 2 456 727 328 Bytes +324.7 KiB +0.014%
ydbd stripped size 523 434 112 Bytes 523 476 064 Bytes +41.0 KiB +0.008%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copilot finished reviewing on behalf of azevaykin November 26, 2025 17:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements brute force vector search pushdown to datashards by adding a new optimization rule KqpApplyVectorTopKToReadTable. This enables efficient ORDER BY Knn::*Distance/Similarity(...) LIMIT N queries without requiring a vector index.

Key Changes:

  • New optimization rule pushes vector top-K operations down to datashard read operations
  • Auto-detection of vector type and dimension from target vectors when no index is present
  • Comprehensive test coverage for various distance metrics and edge cases

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ydb/core/kqp/opt/physical/kqp_opt_phy_limit.cpp Implements KqpApplyVectorTopKToReadTable optimization rule with pattern matching for Knn distance functions
ydb/core/kqp/query_compiler/kqp_query_compiler.cpp Adds helper functions to populate VectorTopK protobuf settings from query AST
ydb/core/base/kmeans_clusters.cpp Implements CreateClustersAutoDetect for auto-detecting vector type/dimension from target vectors
ydb/core/tx/datashard/datashard__read_iterator.cpp Handles VectorTopK settings in datashard read operations with auto-detection support
ydb/core/kqp/runtime/kqp_read_actor.cpp Propagates VectorTopK settings from source to datashard read requests
ydb/core/kqp/executer_actor/kqp_tasks_graph.cpp Extracts and sets VectorTopK parameters for scan tasks
ydb/core/kqp/opt/kqp_opt_build_txs.cpp Handles precompute nodes in stage program bodies for VectorTopK settings
ydb/core/kqp/common/kqp_yql.h/.cpp Adds VectorTopK settings to TKqpReadTableSettings structure
ydb/core/protos/*.proto Adds VectorTopK message fields to support pushdown configuration
ydb/core/kqp/ut/knn/* Comprehensive test suite for vector search pushdown across various metrics and configurations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@azevaykin azevaykin changed the title Implement brute force vector search pushdown Brute force KNN vector search pushdown Nov 27, 2025
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 06:12:32 UTC Pre-commit check linux-x86_64-release-asan for 8ed54cf has started.
2025-11-27 06:13:20 UTC Artifacts will be uploaded here
2025-11-27 06:15:22 UTC ya make is running...
2025-11-27 07:27:09 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 06:13:33 UTC Pre-commit check linux-x86_64-relwithdebinfo for 8ed54cf has started.
2025-11-27 06:13:54 UTC Artifacts will be uploaded here
2025-11-27 06:16:12 UTC ya make is running...
2025-11-27 07:27:15 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 07:30:49 UTC Pre-commit check linux-x86_64-relwithdebinfo for 5c1ecae has started.
2025-11-27 07:31:07 UTC Artifacts will be uploaded here
2025-11-27 07:33:14 UTC ya make is running...
2025-11-27 08:21:51 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 07:31:37 UTC Pre-commit check linux-x86_64-release-asan for 5c1ecae has started.
2025-11-27 07:31:54 UTC Artifacts will be uploaded here
2025-11-27 07:34:00 UTC ya make is running...
2025-11-27 08:21:49 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 08:22:30 UTC Pre-commit check linux-x86_64-release-asan for 4e123d4 has started.
2025-11-27 08:23:21 UTC Artifacts will be uploaded here
2025-11-27 08:25:13 UTC ya make is running...
🟡 2025-11-27 09:31:39 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
11673 11607 0 50 7 9

🟢 2025-11-27 09:31:47 UTC Build successful.
🟡 2025-11-27 09:32:09 UTC ydbd size 3.8 GiB changed* by +1.0 MiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: aceee3a merge: 4e123d4 diff diff %
ydbd size 4 111 911 672 Bytes 4 112 993 272 Bytes +1.0 MiB +0.026%
ydbd stripped size 1 528 188 976 Bytes 1 528 431 376 Bytes +236.7 KiB +0.016%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 08:35:01 UTC Pre-commit check linux-x86_64-relwithdebinfo for 4e123d4 has started.
2025-11-27 08:35:17 UTC Artifacts will be uploaded here
2025-11-27 08:37:27 UTC ya make is running...
🟡 2025-11-27 10:06:53 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38983 36113 0 8 2840 22

2025-11-27 10:07:05 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-11-27 10:15:26 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
50 (only retried tests) 34 0 3 0 13

2025-11-27 10:15:33 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-11-27 10:23:21 UTC Some tests failed, follow the links below.

Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
32 (only retried tests) 18 0 2 0 12

🟢 2025-11-27 10:23:28 UTC Build successful.
🟡 2025-11-27 10:23:53 UTC ydbd size 2.3 GiB changed* by +341.0 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: aceee3a merge: 4e123d4 diff diff %
ydbd size 2 456 534 168 Bytes 2 456 883 384 Bytes +341.0 KiB +0.014%
ydbd stripped size 523 457 344 Bytes 523 502 624 Bytes +44.2 KiB +0.009%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 12:29:14 UTC Pre-commit check linux-x86_64-relwithdebinfo for 078585a has started.
2025-11-27 12:29:32 UTC Artifacts will be uploaded here
2025-11-27 12:31:41 UTC ya make is running...
🟡 2025-11-27 14:51:13 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41653 38781 0 2 2843 27

2025-11-27 14:51:29 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-11-27 15:01:48 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
54 (only retried tests) 41 0 1 0 12

2025-11-27 15:01:55 UTC ya make is running... (failed tests rerun, try 3)
🟢 2025-11-27 15:09:43 UTC Tests successful.

Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
26 (only retried tests) 14 0 0 0 12

🟢 2025-11-27 15:09:51 UTC Build successful.
🟡 2025-11-27 15:10:16 UTC ydbd size 2.3 GiB changed* by +341.0 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: a83ca84 merge: 078585a diff diff %
ydbd size 2 457 056 136 Bytes 2 457 405 320 Bytes +341.0 KiB +0.014%
ydbd stripped size 523 556 960 Bytes 523 602 208 Bytes +44.2 KiB +0.009%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 27, 2025

2025-11-27 12:32:08 UTC Pre-commit check linux-x86_64-release-asan for 078585a has started.
2025-11-27 12:32:25 UTC Artifacts will be uploaded here
2025-11-27 12:34:33 UTC ya make is running...
🟡 2025-11-27 14:21:09 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
13489 13384 0 90 7 8

🟢 2025-11-27 14:21:17 UTC Build successful.
🟡 2025-11-27 14:21:40 UTC ydbd size 3.8 GiB changed* by +1.0 MiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: a83ca84 merge: 078585a diff diff %
ydbd size 4 112 807 976 Bytes 4 113 885 608 Bytes +1.0 MiB +0.026%
ydbd stripped size 1 528 432 080 Bytes 1 528 670 512 Bytes +232.8 KiB +0.016%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@azevaykin azevaykin requested a review from vitalif November 27, 2025 15:52
targetProto->MutableParamValue()->SetParamName(expr.Cast<TCoParameter>().Name().StringValue());
} else if (auto maybeBinding = expr.Maybe<TKqpTxResultBinding>()) {
// TKqpTxResultBinding should have been replaced with TCoParameter by kqp_opt_build_txs,
// but handle it defensively by constructing the expected parameter name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

вот это место немного пугает, откуда здесь берётся TKqpTxResultBinding ? в аналогичном pushdown-е поиска по индексу его тут не было

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argsMap.emplace(inputArg.Raw(), makeParameterBinding(maybeBinding.Cast(), input.Pos()).Ptr());
}

// Also scan the program body for TKqpTxResultBinding (for VectorTopK precompute settings)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

и вот тут, почему тут понадобилась отдельная обработка? вроде бы это не нужно было в pushdown поиска по индексу. может быть можно общую реализацию использовать, чтобы не копипастить обработку TxResultBinding-ов?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

// Also scan the program body for precomputes in read settings (for VectorTopK pushdown)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

И вот тут - там где-то выше был поиск этих precompute, который просто фильтровал ноды, может быть можно просто добавить туда фильтр по типу нод чтобы там тоже зацепило precompute нам нужных параметров

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkDistance(results[0].second, 0.000882f);
checkDistance(results[1].second, 0.000985f);
checkDistance(results[2].second, 0.001070f);
}
Copy link
Collaborator

@vitalif vitalif Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

тут можно ещё добавить кейс где target сам выбирается из другой таблицы

т.к. там выше есть 3 кейса которые как я понял аналогичны тому что у меня в kqp_opt_log_indexes были, а там 3 кейса

  1. литерал
  2. переменная равная вызову функции
  3. переменная равная подзапросу

…DqPhyPrecompute and TKqpTxResultBinding collection
@github-actions
Copy link

github-actions bot commented Nov 28, 2025

2025-11-28 16:00:54 UTC Pre-commit check linux-x86_64-release-asan for 1fe68f5 has started.
2025-11-28 16:01:44 UTC Artifacts will be uploaded here
2025-11-28 16:04:07 UTC ya make is running...
🟡 2025-11-28 17:57:42 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
13494 13419 0 57 7 11

🟢 2025-11-28 17:57:51 UTC Build successful.
🟡 2025-11-28 17:58:15 UTC ydbd size 3.8 GiB changed* by +937.1 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 795ad02 merge: 1fe68f5 diff diff %
ydbd size 4 115 914 064 Bytes 4 116 873 656 Bytes +937.1 KiB +0.023%
ydbd stripped size 1 529 362 800 Bytes 1 529 548 432 Bytes +181.3 KiB +0.012%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 28, 2025

2025-11-28 16:04:01 UTC Pre-commit check linux-x86_64-relwithdebinfo for 1fe68f5 has started.
2025-11-28 16:04:32 UTC Artifacts will be uploaded here
2025-11-28 16:06:47 UTC ya make is running...
🟡 2025-11-28 18:21:29 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41670 38737 0 9 2899 25

2025-11-28 18:21:42 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-11-28 18:31:16 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
114 (only retried tests) 102 0 0 0 12

🟢 2025-11-28 18:31:22 UTC Build successful.
🟡 2025-11-28 18:31:46 UTC ydbd size 2.3 GiB changed* by +287.6 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 795ad02 merge: 1fe68f5 diff diff %
ydbd size 2 458 990 816 Bytes 2 459 285 296 Bytes +287.6 KiB +0.012%
ydbd stripped size 523 812 840 Bytes 523 845 288 Bytes +31.7 KiB +0.006%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants