Skip to content

Conversation

krk
Copy link

@krk krk commented Jul 6, 2019

I have added a new stage called ShardNameStage to add shard name to a $meta projection. It made sense to meld the functionality of the existing stages ShardFilterStage and TextOrStage into ShardNameStage and to modify it to output the current shard name. This might be an incorrect route for implementation, please provide guidance.

To start sharded instances:

mlaunch --replicaset --sharded tic tac toe --port 27027 --binarypath ~/src/mongo/
mlaunch start
mlaunch kill
use example;
sh.enableSharding("example");

db.items.ensureIndex({ a: "hashed" });
sh.shardCollection("example.items", { a: "hashed" });

for (var i = 0; i < 100; i++) db.items.insert({ a: i });

db.items.find({}, { shard: { $meta: "shardName" } }).sort({ a: 1 });

{ "_id" : 1, "a" : 1, "foo" : 1, "shard" : "toe" }
{ "_id" : 5, "a" : 5, "foo" : 5, "shard" : "toe" }
{ "_id" : 2, "a" : 2, "foo" : 2, "shard" : "tic" }
{ "_id" : 3, "a" : 3, "foo" : 3, "shard" : "tic" }
{ "_id" : 4, "a" : 4, "foo" : 4, "shard" : "tic" }

I have tested it locally with mlaunch. There are no unit tests in this PR as of now, as I might be completely lost while implementing it. If functionality is as expected and modifications fit or be made to fit the expected architecture, I can add some unit tests.

@dhatcher42
Copy link
Contributor

Thanks for the pull request! We'll look into it.

@krk krk force-pushed the meta-shard-name branch from 87307c7 to 22fec98 Compare August 4, 2019 17:55
@krk krk force-pushed the meta-shard-name branch from 22fec98 to f5ad625 Compare August 30, 2019 19:36
@dstorch dstorch self-requested a review October 9, 2019 21:51
Copy link
Contributor

@dstorch dstorch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @krk ,

Thanks for submitting this pull request! I've left a few comments inline, but I'll try to summarize the major points of feedback here:

  1. The implementation of projection execution is undergoing a complete rewrite at the moment. Much of this change affects files that are going to be deleted soon, such as ParsedProjection and ProjectionExec. Therefore, I think this change should be rewritten to be built only into the agg projection machinery (i.e. ParsedAggregationProjection). The agg implementation is being generalized so that it can replace classes like ParsedProjection and ProjectionExec. In order to support $meta:"shardName" evaluation in agg projection, you would extend ExpressionMeta' and leave ProjectionExec` unchanged.
  2. Instead of building a new PlanStage for the purpose of attaching the "shardName" metadata, we'd like to explore implementations that attach this metadata elsewhere. This will also be easier to implement once our rewrite of the projection code is finished.
  3. There should be some tests showing that this new feature behaves correctly.

I'd be happy to work with you in order to get these changes in shape for merge, if you're interested! As I mentioned in my message in SERVER-14423, I'd also be happy to suggest other tickets that may be more straightforward to implement and therefore easier for us to accept.

Please feel free to reach out if you have any questions about my comments, or if anything is unclear!

Best,
Dave

}
}

StringData shardName(const WorkingSetMember& member) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently rewriting the way that projections are executed for find commands, so that find commands and aggregate commands share a projection executor implementation. Unfortunately, that means that there has been a lot of code change in this file, and I'm afraid this part of the change won't merge.

_needsGeoNearDistance = true;
} else if (e2.valuestr() == QueryRequest::metaIndexKey) {
_hasReturnKey = true;
} else if (e2.valuestr() == QueryRequest::metaShardName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example of code that is going to be deleted soon, as part of our efforts to consolidate the implementations of find projection and agg projection. Ideally, the only place where we would have to change projection execution code would in the $meta agg expression:

Value ExpressionMeta::evaluate(const Document& root, Variables* variables) const {
const auto& metadata = root.metadata();
switch (_metaType) {
case MetaType::kTextScore:
return metadata.hasTextScore() ? Value(metadata.getTextScore()) : Value();
case MetaType::kRandVal:
return metadata.hasRandVal() ? Value(metadata.getRandVal()) : Value();
case MetaType::kSearchScore:
return metadata.hasSearchScore() ? Value(metadata.getSearchScore()) : Value();
case MetaType::kSearchHighlights:
return metadata.hasSearchHighlights() ? Value(metadata.getSearchHighlights()) : Value();
case MetaType::kGeoNearDist:
return metadata.hasGeoNearDistance() ? Value(metadata.getGeoNearDistance()) : Value();
case MetaType::kGeoNearPoint:
return metadata.hasGeoNearPoint() ? Value(metadata.getGeoNearPoint()) : Value();
case MetaType::kRecordId:
// Be sure that a RecordId can be represented by a long long.
static_assert(RecordId::kMinRepr >= std::numeric_limits<long long>::min());
static_assert(RecordId::kMaxRepr <= std::numeric_limits<long long>::max());
return metadata.hasRecordId()
? Value{static_cast<long long>(metadata.getRecordId().repr())}
: Value();
case MetaType::kIndexKey:
return metadata.hasIndexKey() ? Value(metadata.getIndexKey()) : Value();
case MetaType::kSortKey:
return metadata.hasSortKey()
? Value(DocumentMetadataFields::serializeSortKey(metadata.isSingleElementKey(),
metadata.getSortKey()))
: Value();
default:
MONGO_UNREACHABLE;
}
MONGO_UNREACHABLE;
}

We would need new code there that would know how to handle a $meta:"shardName" expression.

@@ -0,0 +1,104 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of the execution engine is to have as few PlanStages as possible for only the most fundamental query execution concepts. (On the other hand, the number of expressions and accumulators should be quite large and should grow to meet the needs of applications.) Therefore, the query team is reluctant to add a new PlanStage just for the purpose of attaching shard name metadata.

This brings up an interesting question of who should be responsible for attaching the shardName metadata. One idea is to attach this in the ShardFilterStage. We should only attach this metadata, however, if static analysis of the query shows that there is a subsequent $meta:"shardName" expression. There are some changes in the works as part of SERVER-43146 that should make it easier to determine whether the query has this "shardName" metadata dependency.

namespace mongo {

ShardNamerImpl::ShardNamerImpl(ScopedCollectionMetadata md) : _metadata(std::move(md)),
_shardName(isCollectionSharded() ? _metadata->shardId() : StringData("")) {
Copy link
Contributor

@dstorch dstorch Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have consulted with one of the lead engineers on the sharding team, and we think that the way in which we obtain the shard name should be slightly different. Rather than getting the name from the ScopedCollectionMetadata, it can be obtained directly from the mongod's sharding state:

ShardId shardId();

This will work so long as the code which attaches the "shardName" always runs on a mongod, which it should.

}

private:
ScopedCollectionMetadata _metadata;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason to avoid the ShardNamer interface is that it results in the query plan holding two copies of the ScopedCollectionMetadata. As initially conceived, the design is for a query plan to hold a single ScopedCollectionMetadata inside of the ShardFilterStage.

while (projIt.more()) {
BSONElement projElt = projIt.next();
if (isTextScoreMeta(projElt)) {
if (isTextScoreMeta(projElt) || isShardNameMeta(projElt)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this part of the change is needed. If I'm reading the code correctly, this PR does not add the ability to perform a $meta-sort by "shardName", as in {$sort: {foo: {$meta: "shardName"}}}. Furthermore, the query team is interested in relaxing the rule for "textScore" which says that a "textScore" $meta sort must have a corresponding "textScore" $meta projection. See SERVER-43816.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have duplicated the implementation of textScore in most places, that is why this condition was added, removing.

@krk
Copy link
Author

krk commented Oct 16, 2019

Hi @dstorch,

Thank you for the detailed and helpful review.

I am interested in making the necessary changes to this PR after the rewriting of the implementation of projection execution is complete. This PR was already created when you offered to suggest areas to work on in SERVER-14423.

I will try to get this mergeable if there are not too many conflicts. Otherwise, I would prefer to wait until the rewrite is complete.

Also, I am open to other mentored tasks that can be completed in less time than the current PR.

Cheers,
Kerem

@krk krk force-pushed the meta-shard-name branch from f5ad625 to 52bba63 Compare October 16, 2019 23:02
@krk

This comment has been minimized.

@krk krk force-pushed the meta-shard-name branch from 8bc9cff to 6ce6063 Compare October 18, 2019 21:29
@krk
Copy link
Author

krk commented Oct 19, 2019

Hi @dstorch,

I have removed the ShardNameStage and moved the functionality to the ShardFilterStage. Now the ShardingFilterNode constructor has a wantShardName parameter - not sure if this is allowed.

The ShardNamer interface and implementation are removed, new code does not use a ScopedCollectionMetadata instance anymore.

In pipeline_d.cpp, the createRandomCursorExecutor function, I could not find an object that would give access to CanonicalQuery::metadataDeps or ShardingFilterNode::wantShardName so wantShardName parameter is set to false in ShardFilterStage constructors.

It is not clear to me if ProjectionExec code could be moved to ParsedAggregationProjection now or only after the rewriting by the mongo team.

Also, if you could give some pointers on what kind of tests to write first, it would be very helpful.

@krk
Copy link
Author

krk commented Jan 2, 2020

Maybe this year this PR will get some love :) @dstorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants