-
Notifications
You must be signed in to change notification settings - Fork 5.7k
SERVER-41854 Add meta projection $shardName. #1317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks for the pull request! We'll look into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @krk ,
Thanks for submitting this pull request! I've left a few comments inline, but I'll try to summarize the major points of feedback here:
- The implementation of projection execution is undergoing a complete rewrite at the moment. Much of this change affects files that are going to be deleted soon, such as
ParsedProjection
andProjectionExec
. Therefore, I think this change should be rewritten to be built only into the agg projection machinery (i.e.ParsedAggregationProjection
). The agg implementation is being generalized so that it can replace classes likeParsedProjection
andProjectionExec
. In order to support $meta:"shardName" evaluation in agg projection, you would extendExpressionMeta' and leave
ProjectionExec` unchanged. - Instead of building a new
PlanStage
for the purpose of attaching the "shardName" metadata, we'd like to explore implementations that attach this metadata elsewhere. This will also be easier to implement once our rewrite of the projection code is finished. - There should be some tests showing that this new feature behaves correctly.
I'd be happy to work with you in order to get these changes in shape for merge, if you're interested! As I mentioned in my message in SERVER-14423, I'd also be happy to suggest other tickets that may be more straightforward to implement and therefore easier for us to accept.
Please feel free to reach out if you have any questions about my comments, or if anything is unclear!
Best,
Dave
src/mongo/db/exec/projection.cpp
Outdated
} | ||
} | ||
|
||
StringData shardName(const WorkingSetMember& member) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are currently rewriting the way that projections are executed for find commands, so that find commands and aggregate commands share a projection executor implementation. Unfortunately, that means that there has been a lot of code change in this file, and I'm afraid this part of the change won't merge.
_needsGeoNearDistance = true; | ||
} else if (e2.valuestr() == QueryRequest::metaIndexKey) { | ||
_hasReturnKey = true; | ||
} else if (e2.valuestr() == QueryRequest::metaShardName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another example of code that is going to be deleted soon, as part of our efforts to consolidate the implementations of find projection and agg projection. Ideally, the only place where we would have to change projection execution code would in the $meta agg expression:
mongo/src/mongo/db/pipeline/expression.cpp
Lines 2601 to 2634 in 97ffc69
Value ExpressionMeta::evaluate(const Document& root, Variables* variables) const { | |
const auto& metadata = root.metadata(); | |
switch (_metaType) { | |
case MetaType::kTextScore: | |
return metadata.hasTextScore() ? Value(metadata.getTextScore()) : Value(); | |
case MetaType::kRandVal: | |
return metadata.hasRandVal() ? Value(metadata.getRandVal()) : Value(); | |
case MetaType::kSearchScore: | |
return metadata.hasSearchScore() ? Value(metadata.getSearchScore()) : Value(); | |
case MetaType::kSearchHighlights: | |
return metadata.hasSearchHighlights() ? Value(metadata.getSearchHighlights()) : Value(); | |
case MetaType::kGeoNearDist: | |
return metadata.hasGeoNearDistance() ? Value(metadata.getGeoNearDistance()) : Value(); | |
case MetaType::kGeoNearPoint: | |
return metadata.hasGeoNearPoint() ? Value(metadata.getGeoNearPoint()) : Value(); | |
case MetaType::kRecordId: | |
// Be sure that a RecordId can be represented by a long long. | |
static_assert(RecordId::kMinRepr >= std::numeric_limits<long long>::min()); | |
static_assert(RecordId::kMaxRepr <= std::numeric_limits<long long>::max()); | |
return metadata.hasRecordId() | |
? Value{static_cast<long long>(metadata.getRecordId().repr())} | |
: Value(); | |
case MetaType::kIndexKey: | |
return metadata.hasIndexKey() ? Value(metadata.getIndexKey()) : Value(); | |
case MetaType::kSortKey: | |
return metadata.hasSortKey() | |
? Value(DocumentMetadataFields::serializeSortKey(metadata.isSingleElementKey(), | |
metadata.getSortKey())) | |
: Value(); | |
default: | |
MONGO_UNREACHABLE; | |
} | |
MONGO_UNREACHABLE; | |
} |
We would need new code there that would know how to handle a $meta:"shardName" expression.
src/mongo/db/exec/shard_name.cpp
Outdated
@@ -0,0 +1,104 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of the execution engine is to have as few PlanStages as possible for only the most fundamental query execution concepts. (On the other hand, the number of expressions and accumulators should be quite large and should grow to meet the needs of applications.) Therefore, the query team is reluctant to add a new PlanStage just for the purpose of attaching shard name metadata.
This brings up an interesting question of who should be responsible for attaching the shardName metadata. One idea is to attach this in the ShardFilterStage. We should only attach this metadata, however, if static analysis of the query shows that there is a subsequent $meta:"shardName" expression. There are some changes in the works as part of SERVER-43146 that should make it easier to determine whether the query has this "shardName" metadata dependency.
namespace mongo { | ||
|
||
ShardNamerImpl::ShardNamerImpl(ScopedCollectionMetadata md) : _metadata(std::move(md)), | ||
_shardName(isCollectionSharded() ? _metadata->shardId() : StringData("")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have consulted with one of the lead engineers on the sharding team, and we think that the way in which we obtain the shard name should be slightly different. Rather than getting the name from the ScopedCollectionMetadata, it can be obtained directly from the mongod's sharding state:
mongo/src/mongo/db/s/sharding_state.h
Line 99 in 6978f67
ShardId shardId(); |
This will work so long as the code which attaches the "shardName" always runs on a mongod, which it should.
src/mongo/db/exec/shard_namer_impl.h
Outdated
} | ||
|
||
private: | ||
ScopedCollectionMetadata _metadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason to avoid the ShardNamer
interface is that it results in the query plan holding two copies of the ScopedCollectionMetadata
. As initially conceived, the design is for a query plan to hold a single ScopedCollectionMetadata
inside of the ShardFilterStage
.
while (projIt.more()) { | ||
BSONElement projElt = projIt.next(); | ||
if (isTextScoreMeta(projElt)) { | ||
if (isTextScoreMeta(projElt) || isShardNameMeta(projElt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this part of the change is needed. If I'm reading the code correctly, this PR does not add the ability to perform a $meta-sort by "shardName", as in {$sort: {foo: {$meta: "shardName"}}}. Furthermore, the query team is interested in relaxing the rule for "textScore" which says that a "textScore" $meta sort must have a corresponding "textScore" $meta projection. See SERVER-43816.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have duplicated the implementation of textScore
in most places, that is why this condition was added, removing.
Hi @dstorch, Thank you for the detailed and helpful review. I am interested in making the necessary changes to this PR after the rewriting of the implementation of projection execution is complete. This PR was already created when you offered to suggest areas to work on in SERVER-14423. I will try to get this mergeable if there are not too many conflicts. Otherwise, I would prefer to wait until the rewrite is complete. Also, I am open to other mentored tasks that can be completed in less time than the current PR. Cheers, |
This comment has been minimized.
This comment has been minimized.
Hi @dstorch, I have removed the The In It is not clear to me if Also, if you could give some pointers on what kind of tests to write first, it would be very helpful. |
Maybe this year this PR will get some love :) @dstorch |
I have added a new stage called
ShardNameStage
to add shard name to a$meta
projection. It made sense to meld the functionality of the existing stagesShardFilterStage
andTextOrStage
intoShardNameStage
and to modify it to output the current shard name. This might be an incorrect route for implementation, please provide guidance.To start sharded instances:
I have tested it locally with mlaunch. There are no unit tests in this PR as of now, as I might be completely lost while implementing it. If functionality is as expected and modifications fit or be made to fit the expected architecture, I can add some unit tests.