Skip to content

Conversation

@AlexFilipImproving
Copy link

No description provided.

@AlexFilipImproving AlexFilipImproving marked this pull request as draft November 21, 2025 15:36
Copy link

@Jonathan-Improving Jonathan-Improving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link

@allenss-amazon allenss-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to handle the case where both SORTBY and LIMIT are specified. In this scenario, rather than a full sort of the neighbors, a heap should be used to retain only the required number of entries.

Copy link

@allenss-amazon allenss-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the main data path of this code is wrong.

  1. It's on the wrong thread and thus looking at potentially stale data. The data for the sort needs to come out of fetched data, not the indexes.
  2. In cluster mode it's only operating on the local data, not the post-merged global result.

It needs to be placed in the call chain of "SearchCommand::SendReply" after the current contents of the candidate keys has been fetched and is available. This will probably require a logic change in the "NOCONTENT" option (since it's suppressing fetching of per-key fields) as well as ensuring that the content fetch code includes the sortby field.

Perhaps some 1:1 time with me will help here.

return results;
}

absl::StatusOr<std::deque<indexes::Neighbor>> ApplySorting(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can fail. The only possible failure is if the sortby field isn't found, and that's probably a crash since you've already validated it in the command itself long before you get here....

@AlexFilipImproving
Copy link
Author

AlexFilipImproving commented Nov 27, 2025

The structure of the main data path of this code is wrong.

  1. It's on the wrong thread and thus looking at potentially stale data. The data for the sort needs to come out of fetched data, not the indexes.
  2. In cluster mode it's only operating on the local data, not the post-merged global result.

It needs to be placed in the call chain of "SearchCommand::SendReply" after the current contents of the candidate keys has been fetched and is available. This will probably require a logic change in the "NOCONTENT" option (since it's suppressing fetching of per-key fields) as well as ensuring that the content fetch code includes the sortby field.

Perhaps some 1:1 time with me will help here.

@allenss-amazon I generally understand the idea but yes, I think a 1:1 would be a good idea. I'm looking at the code now and I think I need a bit of guidance on the implementation. How would you like the meeting set up?

@AlexFilipImproving AlexFilipImproving force-pushed the feature/sortby-implementation branch from 0f89d05 to 8e0de57 Compare November 28, 2025 17:12
Copy link

@allenss-amazon allenss-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Two things. 1/ I'd like to see a heap-oriented sort for cases where LIMIT is in effect and 2/ need some data oriented actual search cases.

Comment on lines 192 to 198
double num_a, num_b;
if (!absl::SimpleAtod(val_a, &num_a)) return false;
if (!absl::SimpleAtod(val_b, &num_b)) return true;
result = num_a < num_b;
} else {
result = val_a < val_b;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to have these kinds of data comparison centralized so that all of the various ways to compare data get the same answers. I'm not familiar with out SimpleAtoD deals with infinity and NaN, etc. Please take a look at the files in the src/expr/* directory.

int k{0};
std::optional<unsigned> ef;
LimitParameter limit;
SortByParameter sortby;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this belongs in "struct SearchCommand", not here. This parameter is private to the search command, not something that's shared across general queries. (Sorry the command naming here is legacy and therefore confusing).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing this and I'm not sure how this would fit into SearchCommand since it needs to be located by the parser and I would have to change the implementation of ConstructSortByParser to take a SearchCommand &, which would propagate to all other parsers in ft.search.
It would also be inconsistent with other parsers, like ft.aggregate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy issue. Originally, there wasn't a difference between SearchParameters and SearchCommand. When this object hierarchy was introduced it looks like the parsing logic for ft.search didn't get the update. Please update it, it shouldn't require much more than changing the parameters definitions on a few functions in the dedicate ft.search parsing logic.

.sortby_parameters_str = "SORTBY attribute_identifier_2",
.sortby_field = "attribute_identifier_2",
.sortby_order = query::SortOrder::kAscending,
.sortby_enabled = true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A parser test for a field NOT in the schema might be useful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test where the attribute is not found and found that I actually wasn't testing for this case so that's in here too now.

"embedding",
"$embedding",
"SORTBY",
"vector",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some data oriented tests. In theory, the compatibility framework that's under integration/compatibility would be the best place to do this. It used to work for ft.search commands, but I see that those tests have been commented out, which leads me to think that it's not working for those commands. But fixing it should be relatively easy.

Copy link

@allenss-amazon allenss-amazon Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latest version should work with ft.search commands. I fixed some bugs and I've been using it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the place to test numeric conversion errors.

@AlexFilipImproving AlexFilipImproving force-pushed the feature/sortby-implementation branch 2 times, most recently from 021dec9 to d758c86 Compare December 10, 2025 19:49
@AlexFilipImproving AlexFilipImproving force-pushed the feature/sortby-implementation branch 2 times, most recently from 7349606 to bff42e1 Compare January 8, 2026 20:47
Copy link

@allenss-amazon allenss-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be several changes in the PR that are either just formatting or perhaps some temporary changes for debugging purposes. Let's get those cleaned up. Otherwise, the basic core functionality looks correct to me.

Also, I'd definitely want to see some test cases in the compatibility test suite

data_set = do_answer(answers[i], data_set)
data_set = do_answer(self.server.get_new_client(), answers[i], data_set)

print(f"Correct answers: {correct_answers} out of {len(answers)}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are changes here intentional? If so, can you explain?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part of compatibility tests that is commented out. I am looking into writing those numeric tests that you asked for in this file, but I have been having some trouble with the build environment for a few days.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change in the latest commit

Comment on lines 250 to 257
// Support non-vector queries
if (IsNonVectorQuery()) {
query::ProcessNonVectorNeighborsForReply(
ctx, index_schema->GetAttributeDataType(), neighbors, *this);
ApplySorting(neighbors, *this);
SerializeNonVectorNeighbors(ctx, search_result, *this);
return;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why restrict this to non-vector?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch this when resolving a merge conflict. It was supposed to be a single added line to the if statement above. I'll go over the entire change again to make sure nothing else slipped through.

<< "Error parsing vector similarity parameters: ";
}

if (auto searchCommand = dynamic_cast<SearchCommand *>(&parameters);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anytime I see a dynamic_cast, I wonder if this isn't a band-aid on some incorrect object structure. Here, it looks like the preparse and postparse functions should be virtual, that would eliminate the need for the dynamic cast.

struct SortByParameter {
std::string field;
SortOrder order{SortOrder::kAscending};
bool enabled{false};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than an explicit enabled flag, perhaps this should be std::optional ?

Comment on lines 767 to 955
.k = 0,
.ef = 0,
.score_as = "",
.vector_query = false,
.sortby_parameters_str = "SORTBY attribute_identifier_1 DESC",
.sortby_field = "attribute_identifier_1",
.sortby_order = SortOrder::kDescending,
.sortby_enabled = true,
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's tweak some of the keywords with one or more lower case letters to ensure that we're getting the case-invariant matching correct.

"embedding",
"$embedding",
"SORTBY",
"vector",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the place to test numeric conversion errors.

Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
@AlexFilipImproving AlexFilipImproving force-pushed the feature/sortby-implementation branch from 8defc22 to 62f13e1 Compare January 28, 2026 20:02
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants