Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SERVER-41854 Add meta projection $shardName. #1317

Open
wants to merge 7 commits into
base: master
from

Conversation

@krk
Copy link

krk commented Jul 6, 2019

I have added a new stage called ShardNameStage to add shard name to a $meta projection. It made sense to meld the functionality of the existing stages ShardFilterStage and TextOrStage into ShardNameStage and to modify it to output the current shard name. This might be an incorrect route for implementation, please provide guidance.

To start sharded instances:

mlaunch --replicaset --sharded tic tac toe --port 27027 --binarypath ~/src/mongo/
mlaunch start
mlaunch kill
use example;
sh.enableSharding("example");

db.items.ensureIndex({ a: "hashed" });
sh.shardCollection("example.items", { a: "hashed" });

for (var i = 0; i < 100; i++) db.items.insert({ a: i });

db.items.find({}, { shard: { $meta: "shardName" } }).sort({ a: 1 });

{ "_id" : 1, "a" : 1, "foo" : 1, "shard" : "toe" }
{ "_id" : 5, "a" : 5, "foo" : 5, "shard" : "toe" }
{ "_id" : 2, "a" : 2, "foo" : 2, "shard" : "tic" }
{ "_id" : 3, "a" : 3, "foo" : 3, "shard" : "tic" }
{ "_id" : 4, "a" : 4, "foo" : 4, "shard" : "tic" }

I have tested it locally with mlaunch. There are no unit tests in this PR as of now, as I might be completely lost while implementing it. If functionality is as expected and modifications fit or be made to fit the expected architecture, I can add some unit tests.

@dhatcher42
Copy link
Contributor

dhatcher42 commented Jul 8, 2019

Thanks for the pull request! We'll look into it.

@krk krk force-pushed the krk:meta-shard-name branch from 87307c7 to 22fec98 Aug 4, 2019
@krk krk force-pushed the krk:meta-shard-name branch from 22fec98 to f5ad625 Aug 30, 2019
@dstorch dstorch self-requested a review Oct 9, 2019
Copy link
Contributor

dstorch left a comment

Hi @krk ,

Thanks for submitting this pull request! I've left a few comments inline, but I'll try to summarize the major points of feedback here:

  1. The implementation of projection execution is undergoing a complete rewrite at the moment. Much of this change affects files that are going to be deleted soon, such as ParsedProjection and ProjectionExec. Therefore, I think this change should be rewritten to be built only into the agg projection machinery (i.e. ParsedAggregationProjection). The agg implementation is being generalized so that it can replace classes like ParsedProjection and ProjectionExec. In order to support $meta:"shardName" evaluation in agg projection, you would extend ExpressionMeta' and leave ProjectionExec` unchanged.
  2. Instead of building a new PlanStage for the purpose of attaching the "shardName" metadata, we'd like to explore implementations that attach this metadata elsewhere. This will also be easier to implement once our rewrite of the projection code is finished.
  3. There should be some tests showing that this new feature behaves correctly.

I'd be happy to work with you in order to get these changes in shape for merge, if you're interested! As I mentioned in my message in SERVER-14423, I'd also be happy to suggest other tickets that may be more straightforward to implement and therefore easier for us to accept.

Please feel free to reach out if you have any questions about my comments, or if anything is unclear!

Best,
Dave

@@ -77,6 +77,18 @@ double textScore(const WorkingSetMember& member) {
}
}

StringData shardName(const WorkingSetMember& member) {

This comment has been minimized.

@dstorch

dstorch Oct 9, 2019

Contributor

We are currently rewriting the way that projections are executed for find commands, so that find commands and aggregate commands share a projection executor implementation. Unfortunately, that means that there has been a lot of code change in this file, and I'm afraid this part of the change won't merge.

@@ -119,6 +119,9 @@ ProjectionExec::ProjectionExec(OperationContext* opCtx,
_needsGeoNearDistance = true;
} else if (e2.valuestr() == QueryRequest::metaIndexKey) {
_hasReturnKey = true;
} else if (e2.valuestr() == QueryRequest::metaShardName) {

This comment has been minimized.

@dstorch

dstorch Oct 9, 2019

Contributor

This is another example of code that is going to be deleted soon, as part of our efforts to consolidate the implementations of find projection and agg projection. Ideally, the only place where we would have to change projection execution code would in the $meta agg expression:

Value ExpressionMeta::evaluate(const Document& root, Variables* variables) const {
const auto& metadata = root.metadata();
switch (_metaType) {
case MetaType::kTextScore:
return metadata.hasTextScore() ? Value(metadata.getTextScore()) : Value();
case MetaType::kRandVal:
return metadata.hasRandVal() ? Value(metadata.getRandVal()) : Value();
case MetaType::kSearchScore:
return metadata.hasSearchScore() ? Value(metadata.getSearchScore()) : Value();
case MetaType::kSearchHighlights:
return metadata.hasSearchHighlights() ? Value(metadata.getSearchHighlights()) : Value();
case MetaType::kGeoNearDist:
return metadata.hasGeoNearDistance() ? Value(metadata.getGeoNearDistance()) : Value();
case MetaType::kGeoNearPoint:
return metadata.hasGeoNearPoint() ? Value(metadata.getGeoNearPoint()) : Value();
case MetaType::kRecordId:
// Be sure that a RecordId can be represented by a long long.
static_assert(RecordId::kMinRepr >= std::numeric_limits<long long>::min());
static_assert(RecordId::kMaxRepr <= std::numeric_limits<long long>::max());
return metadata.hasRecordId()
? Value{static_cast<long long>(metadata.getRecordId().repr())}
: Value();
case MetaType::kIndexKey:
return metadata.hasIndexKey() ? Value(metadata.getIndexKey()) : Value();
case MetaType::kSortKey:
return metadata.hasSortKey()
? Value(DocumentMetadataFields::serializeSortKey(metadata.isSingleElementKey(),
metadata.getSortKey()))
: Value();
default:
MONGO_UNREACHABLE;
}
MONGO_UNREACHABLE;
}

We would need new code there that would know how to handle a $meta:"shardName" expression.

@@ -0,0 +1,104 @@
/**

This comment has been minimized.

@dstorch

dstorch Oct 9, 2019

Contributor

The design of the execution engine is to have as few PlanStages as possible for only the most fundamental query execution concepts. (On the other hand, the number of expressions and accumulators should be quite large and should grow to meet the needs of applications.) Therefore, the query team is reluctant to add a new PlanStage just for the purpose of attaching shard name metadata.

This brings up an interesting question of who should be responsible for attaching the shardName metadata. One idea is to attach this in the ShardFilterStage. We should only attach this metadata, however, if static analysis of the query shows that there is a subsequent $meta:"shardName" expression. There are some changes in the works as part of SERVER-43146 that should make it easier to determine whether the query has this "shardName" metadata dependency.

namespace mongo {

ShardNamerImpl::ShardNamerImpl(ScopedCollectionMetadata md) : _metadata(std::move(md)),
_shardName(isCollectionSharded() ? _metadata->shardId() : StringData("")) {

This comment has been minimized.

@dstorch

dstorch Oct 9, 2019

Contributor

I have consulted with one of the lead engineers on the sharding team, and we think that the way in which we obtain the shard name should be slightly different. Rather than getting the name from the ScopedCollectionMetadata, it can be obtained directly from the mongod's sharding state:

ShardId shardId();

This will work so long as the code which attaches the "shardName" always runs on a mongod, which it should.

}

private:
ScopedCollectionMetadata _metadata;

This comment has been minimized.

@dstorch

dstorch Oct 9, 2019

Contributor

Another reason to avoid the ShardNamer interface is that it results in the query plan holding two copies of the ScopedCollectionMetadata. As initially conceived, the design is for a query plan to hold a single ScopedCollectionMetadata inside of the ShardFilterStage.

@@ -622,9 +623,10 @@ Status QueryRequest::validate() const {
BSONObjIterator projIt(_proj);
while (projIt.more()) {
BSONElement projElt = projIt.next();
if (isTextScoreMeta(projElt)) {
if (isTextScoreMeta(projElt) || isShardNameMeta(projElt)) {

This comment has been minimized.

@dstorch

dstorch Oct 9, 2019

Contributor

I'm not sure that this part of the change is needed. If I'm reading the code correctly, this PR does not add the ability to perform a $meta-sort by "shardName", as in {$sort: {foo: {$meta: "shardName"}}}. Furthermore, the query team is interested in relaxing the rule for "textScore" which says that a "textScore" $meta sort must have a corresponding "textScore" $meta projection. See SERVER-43816.

This comment has been minimized.

@krk

krk Oct 18, 2019

Author

I have duplicated the implementation of textScore in most places, that is why this condition was added, removing.

@krk
Copy link
Author

krk commented Oct 16, 2019

Hi @dstorch,

Thank you for the detailed and helpful review.

I am interested in making the necessary changes to this PR after the rewriting of the implementation of projection execution is complete. This PR was already created when you offered to suggest areas to work on in SERVER-14423.

I will try to get this mergeable if there are not too many conflicts. Otherwise, I would prefer to wait until the rewrite is complete.

Also, I am open to other mentored tasks that can be completed in less time than the current PR.

Cheers,
Kerem

@krk krk force-pushed the krk:meta-shard-name branch from f5ad625 to 52bba63 Oct 16, 2019
@krk

This comment was marked as outdated.

@krk krk force-pushed the krk:meta-shard-name branch from 8bc9cff to 6ce6063 Oct 18, 2019
@krk
Copy link
Author

krk commented Oct 19, 2019

Hi @dstorch,

I have removed the ShardNameStage and moved the functionality to the ShardFilterStage. Now the ShardingFilterNode constructor has a wantShardName parameter - not sure if this is allowed.

The ShardNamer interface and implementation are removed, new code does not use a ScopedCollectionMetadata instance anymore.

In pipeline_d.cpp, the createRandomCursorExecutor function, I could not find an object that would give access to CanonicalQuery::metadataDeps or ShardingFilterNode::wantShardName so wantShardName parameter is set to false in ShardFilterStage constructors.

It is not clear to me if ProjectionExec code could be moved to ParsedAggregationProjection now or only after the rewriting by the mongo team.

Also, if you could give some pointers on what kind of tests to write first, it would be very helpful.

@krk
Copy link
Author

krk commented Jan 2, 2020

Maybe this year this PR will get some love :) @dstorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.