Skip to content

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Aug 20, 2025

sql: make getPlanDistribution a method on planner

This will simplify the following commit a bit.

sql: disable DistSQL when txn buffered writes and MVCC decoding is required

Previously, if we buffered some writes on the RootTxn and then issued
a query that touches system columns that require MVCC decoding (like
crdb_internal_mvcc_timestamp) that is executed via DistSQL, it would
crash the process. The reason is that MVCC decoding is currently not
supported by the txnWriteBuffer, so we disable buffered writes, which is
only allowed on the RootTxn but we'd have the LeafTxn due to the plan
being distributed.

This commit fixes this issue by disabling DistSQL if

  • the txn has buffered some writes, and
  • one of the system columns that requires MVCC decoding is requested.

Executing the query will proceed with the "local" execution model which
will flush the buffer since we'll still disable buffered writes because
of MVCC decoding.

There are a couple of other complications to keep in mind:

  • we're deciding on the plan distribution for the main query before we
    execute any of the subqueries. Thus, to be safe in that case, if the
    query as a whole has a mutation AND it has at least one subquery, we'll
    "assume the worst" and will treat the subquery as if it's guaranteed to
    buffer some writes.
  • usage of LeafTxn can also be forced due to parallel processors. One
    such case is "parallelize scans if local", and now we'll prohibit
    parallelization if one of the system columns is requested. Another case
    is when we use the streamer - we now also prohibit its usage if one of
    the system columns is requested.

I decided to omit the release note given it's an edge case in disabled
by default feature.

Fixes: #151325

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the bw-mvcc-timestamp-v2 branch 3 times, most recently from 9b6623c to 6e864ce Compare August 27, 2025 19:02
@yuzefovich yuzefovich changed the title WIP on disabling DistSQL after we buffered writes and request sys col sql: disable DistSQL when txn buffered writes and MVCC decoding is required Aug 27, 2025
@yuzefovich yuzefovich force-pushed the bw-mvcc-timestamp-v2 branch from 6e864ce to 0c79fa5 Compare August 27, 2025 19:20
@yuzefovich yuzefovich marked this pull request as ready for review August 27, 2025 19:24
@yuzefovich yuzefovich requested review from a team as code owners August 27, 2025 19:24
@yuzefovich yuzefovich added backport-25.2.x Flags PRs that need to be backported to 25.2 backport-25.3.x labels Aug 27, 2025
@yuzefovich
Copy link
Member Author

It seems like this approach is less fragile than #152048, so I like it more. PTAL.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @stevendanna)


-- commits line 40 at r5:
New randomized testing from #151983 helped uncover the streamer case I initially missed.

This will simplify the following commit a bit.

Release note: None
…quired

Previously, if we buffered some writes on the RootTxn and then issued
a query that touches system columns that require MVCC decoding (like
`crdb_internal_mvcc_timestamp`) that is executed via DistSQL, it would
crash the process. The reason is that MVCC decoding is currently not
supported by the txnWriteBuffer, so we disable buffered writes, which is
only allowed on the RootTxn but we'd have the LeafTxn due to the plan
being distributed.

This commit fixes this issue by disabling DistSQL if
- the txn has buffered some writes, and
- one of the system columns that requires MVCC decoding is requested.

Executing the query will proceed with the "local" execution model which
will flush the buffer since we'll still disable buffered writes because
of MVCC decoding.

There are a couple of other complications to keep in mind:
- we're deciding on the plan distribution for the main query _before_ we
execute any of the subqueries. Thus, to be safe in that case, if the
query as a whole has a mutation AND it has at least one subquery, we'll
"assume the worst" and will treat the subquery as if it's guaranteed to
buffer some writes.
- usage of LeafTxn can also be forced due to parallel processors. One
such case is "parallelize scans if local", and now we'll prohibit
parallelization if one of the system columns is requested. Another case
is when we use the streamer - we now also prohibit its usage if one of
the system columns is requested.

I decided to omit the release note given it's an edge case in disabled
by default feature.

Release note: None
@yuzefovich yuzefovich force-pushed the bw-mvcc-timestamp-v2 branch from a45a7d8 to 6b42a17 Compare August 28, 2025 02:20
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I liked this strategy. It does push more knowledge of buffered writes into SQL, which is a downside, but it seems safe than the alternative.

I may not be the best reviewer for this as I don't know much about plan construction to verify those parts of the changes beyond a surface analysis.

@stevendanna reviewed 3 of 22 files at r1, 2 of 19 files at r2, 17 of 17 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/distsql_spec_exec_factory.go line 986 at r7 (raw file):

			fetch.requiresMVCCDecoding()
		// TODO(yuzefovich): only require MVCC decoding when txn has buffered
		// writes.

For these cases where we aren't currently looking for txnHasBufferedWrites, is the work to do "plumbing" (i.e. getting the information about the write buffer to this location) or are there additional complications as well?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @stevendanna)


pkg/sql/distsql_spec_exec_factory.go line 986 at r7 (raw file):

Previously, stevendanna (Steven Danna) wrote…

For these cases where we aren't currently looking for txnHasBufferedWrites, is the work to do "plumbing" (i.e. getting the information about the write buffer to this location) or are there additional complications as well?

No additional complications - just needed to do the plumbing, but this code is experimental (and has been like this for several years), so I didn't bother doing it to reduce the scope of changes.

@yuzefovich
Copy link
Member Author

friendly ping @michae2, or maybe @DrewKimball you could take a look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-25.2.x Flags PRs that need to be backported to 25.2 backport-25.3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: sqlsmith: SetBufferedWritesEnabled() called on leaf txn
3 participants