-
Notifications
You must be signed in to change notification settings - Fork 4k
sql: disable DistSQL when txn buffered writes and MVCC decoding is required #152201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9b6623c
to
6e864ce
Compare
6e864ce
to
0c79fa5
Compare
It seems like this approach is less fragile than #152048, so I like it more. PTAL. |
0c79fa5
to
b5b4b8d
Compare
b5b4b8d
to
a45a7d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @stevendanna)
-- commits
line 40 at r5:
New randomized testing from #151983 helped uncover the streamer case I initially missed.
This will simplify the following commit a bit. Release note: None
…quired Previously, if we buffered some writes on the RootTxn and then issued a query that touches system columns that require MVCC decoding (like `crdb_internal_mvcc_timestamp`) that is executed via DistSQL, it would crash the process. The reason is that MVCC decoding is currently not supported by the txnWriteBuffer, so we disable buffered writes, which is only allowed on the RootTxn but we'd have the LeafTxn due to the plan being distributed. This commit fixes this issue by disabling DistSQL if - the txn has buffered some writes, and - one of the system columns that requires MVCC decoding is requested. Executing the query will proceed with the "local" execution model which will flush the buffer since we'll still disable buffered writes because of MVCC decoding. There are a couple of other complications to keep in mind: - we're deciding on the plan distribution for the main query _before_ we execute any of the subqueries. Thus, to be safe in that case, if the query as a whole has a mutation AND it has at least one subquery, we'll "assume the worst" and will treat the subquery as if it's guaranteed to buffer some writes. - usage of LeafTxn can also be forced due to parallel processors. One such case is "parallelize scans if local", and now we'll prohibit parallelization if one of the system columns is requested. Another case is when we use the streamer - we now also prohibit its usage if one of the system columns is requested. I decided to omit the release note given it's an edge case in disabled by default feature. Release note: None
a45a7d8
to
6b42a17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I liked this strategy. It does push more knowledge of buffered writes into SQL, which is a downside, but it seems safe than the alternative.
I may not be the best reviewer for this as I don't know much about plan construction to verify those parts of the changes beyond a surface analysis.
@stevendanna reviewed 3 of 22 files at r1, 2 of 19 files at r2, 17 of 17 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/sql/distsql_spec_exec_factory.go
line 986 at r7 (raw file):
fetch.requiresMVCCDecoding() // TODO(yuzefovich): only require MVCC decoding when txn has buffered // writes.
For these cases where we aren't currently looking for txnHasBufferedWrites, is the work to do "plumbing" (i.e. getting the information about the write buffer to this location) or are there additional complications as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @stevendanna)
pkg/sql/distsql_spec_exec_factory.go
line 986 at r7 (raw file):
Previously, stevendanna (Steven Danna) wrote…
For these cases where we aren't currently looking for txnHasBufferedWrites, is the work to do "plumbing" (i.e. getting the information about the write buffer to this location) or are there additional complications as well?
No additional complications - just needed to do the plumbing, but this code is experimental (and has been like this for several years), so I didn't bother doing it to reduce the scope of changes.
friendly ping @michae2, or maybe @DrewKimball you could take a look at this |
sql: make
getPlanDistribution
a method onplanner
This will simplify the following commit a bit.
sql: disable DistSQL when txn buffered writes and MVCC decoding is required
Previously, if we buffered some writes on the RootTxn and then issued
a query that touches system columns that require MVCC decoding (like
crdb_internal_mvcc_timestamp
) that is executed via DistSQL, it wouldcrash the process. The reason is that MVCC decoding is currently not
supported by the txnWriteBuffer, so we disable buffered writes, which is
only allowed on the RootTxn but we'd have the LeafTxn due to the plan
being distributed.
This commit fixes this issue by disabling DistSQL if
Executing the query will proceed with the "local" execution model which
will flush the buffer since we'll still disable buffered writes because
of MVCC decoding.
There are a couple of other complications to keep in mind:
execute any of the subqueries. Thus, to be safe in that case, if the
query as a whole has a mutation AND it has at least one subquery, we'll
"assume the worst" and will treat the subquery as if it's guaranteed to
buffer some writes.
such case is "parallelize scans if local", and now we'll prohibit
parallelization if one of the system columns is requested. Another case
is when we use the streamer - we now also prohibit its usage if one of
the system columns is requested.
I decided to omit the release note given it's an edge case in disabled
by default feature.
Fixes: #151325