executor: fix analyze cannot be killed #65249

hawkingrei · 2025-12-25T04:26:23Z

What problem does this PR solve?

Issue Number: close #65818

Problem Summary:

Analyze does not propagate cancellation context into RPC/NextRaw; killing the query can leave analyze workers blocked.

What changed and how does it work?

Pass SQLKiller-derived context into analyze workers and all V1 analyze column paths.
Replace context.TODO() with the propagated ctx in analyze V1 build/consume flow.
Add a failpoint-gated test to ensure analyze exits promptly after ctx cancellation.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

hawkingrei · 2025-12-25T04:40:25Z

/retest

codecov · 2025-12-25T05:05:47Z

Codecov Report

❌ Patch coverage is 72.88136% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.1469%. Comparing base (5de6f55) to head (ed61b41).
⚠️ Report is 30 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #65249        +/-   ##
================================================
+ Coverage   77.7888%   78.1469%   +0.3580%     
================================================
  Files          2000       1922        -78     
  Lines        545038     535993      -9045     
================================================
- Hits         423979     418862      -5117     
+ Misses       119397     116660      -2737     
+ Partials       1662        471      -1191

Flag	Coverage Δ
integration	`44.1556% <36.8038%> (-4.0150%)`	⬇️
unit	`76.4101% <72.8813%> (-0.0186%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.8228% <ø> (-12.1487%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hawkingrei · 2025-12-29T02:30:27Z

/retest

Copilot

Pull request overview

This PR aims to make ANALYZE responsive to query kill/cancellation by propagating a SQLKiller-derived cancellation context into analyze workers and DistSQL/NextRaw paths, and adds a failpoint-based test to validate prompt exit on context cancellation.

Changes:

Add a SQLKiller-provided cancelable context (GetKillEventCtx) and propagate it through analyze execution paths.
Replace context.TODO() with a propagated ctx in several analyze V1/V2 build/consume flows and add ctx-aware worker loops.
Add a failpoint-gated unit test ensuring ANALYZE exits quickly after ctx cancellation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
pkg/util/sqlkiller/sqlkiller.go	Introduces kill-event context support for cancellation propagation.
pkg/util/sqlkiller/BUILD.bazel	Adds dependency needed for new sqlkiller error usage.
pkg/executor/analyze.go	Threads kill-derived ctx into analyze workers.
pkg/executor/analyze_col.go	Propagates ctx into analyze V1 build/NextRaw flow.
pkg/executor/analyze_col_v2.go	Adds ctx plumbing/cancellation to analyze V2 sampling workers and NextRaw usage.
pkg/executor/analyze_idx.go	Adds ctx parameters through index analyze call chain and cancellation checks.
pkg/distsql/distsql.go	Adds failpoint to block until ctx cancellation for testing.
pkg/executor/test/analyzetest/analyze_test.go	Adds unit test validating analyze cancellation behavior.

pkg/util/sqlkiller/sqlkiller.go

pkg/executor/analyze_col_v2.go

pkg/executor/analyze.go

pkg/executor/analyze_idx.go

pkg/executor/analyze.go

pkg/util/sqlkiller/sqlkiller.go

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

pkg/executor/analyze_col_v2.go

Copilot · 2026-01-26T10:27:13Z

pkg/executor/analyze_col_v2.go

+			statsHandle.FinishAnalyzeJob(results.Job, nil, statistics.TableAnalysisJob)
+			totalResult.results[results.Ars[0].Hist[0].ID] = results
+		case <-ctx.Done():
+			err = ctx.Err()


On cancellation, this path sets err = ctx.Err(), which loses the cancellation cause from SQLKiller.GetKillEventCtx (set via cancelFn(errKilled)). Consider using context.Cause(ctx) (falling back to ctx.Err()) so the caller can differentiate SQL kill vs plain context cancellation.

Suggested change

err = ctx.Err()

err = context.Cause(ctx)

if err == nil {

err = ctx.Err()

}

pkg/executor/analyze_col_v2.go

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

pkg/executor/analyze_col_v2.go

pkg/util/sqlkiller/sqlkiller.go

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Copilot · 2026-01-26T14:05:33Z

pkg/executor/analyze_col_v2.go

+			// Unmarshal the data.
+			dataSize := int64(cap(data))
+			colResp := &tipb.AnalyzeColumnsResp{}
+			err := colResp.Unmarshal(data)
+			if err != nil {
+				resultCh <- &samplingMergeResult{err: err}
+				return
+			}


subMergeWorker returns early on Unmarshal error and sends the error to resultCh, but it doesn't clean up retCollector before returning. This can leak collector resources/memory on decode errors. Consider destroying/returning retCollector to the pool before returning in these early-exit error cases.

Copilot · 2026-01-26T14:05:33Z

pkg/distsql/distsql.go

+	failpoint.Inject("mockAnalyzeRequestWaitForCancel", func(val failpoint.Value) {
+		if val.(bool) {
+			<-ctx.Done()
+			failpoint.Return(nil, ctx.Err())


The failpoint 'mockAnalyzeRequestWaitForCancel' returns ctx.Err(), which discards any cancellation cause set via context.WithCancelCause (e.g. SQLKiller’s specific interrupt error). Returning context.Cause(ctx) (falling back to ctx.Err() if nil) would preserve the intended kill reason in tests and callers using cancel causes.

Suggested change

failpoint.Return(nil, ctx.Err())

err := context.Cause(ctx)

if err == nil {

err = ctx.Err()

}

failpoint.Return(nil, err)

pkg/executor/analyze_col_v2.go

pkg/util/sqlkiller/sqlkiller.go

Copilot · 2026-01-26T14:05:34Z

pkg/executor/analyze_col_v2.go

+		case <-ctx.Done():
+			err := context.Cause(ctx)
+			if err != nil {
+				resultCh <- &samplingMergeResult{err: err}
+				return
+			}
+			err = ctx.Err()
+			if err != nil {
+				resultCh <- &samplingMergeResult{err: err}
+				return
+			}
+			if intest.InTest {
+				panic("this ctx should be canceled with the error")
+			}
+			resultCh <- &samplingMergeResult{err: errors.New("context canceled without error")}
 			return


In subMergeWorker, the ctx.Done() branch returns after sending an error to resultCh, but the local retCollector is never DestroyAndPutToPool()'d (and any memory it accumulated is never released). This can leak collector objects/memory on cancellation. Ensure retCollector is cleaned up before returning on ctx cancellation (and similarly for other early-return error paths).

ti-chi-bot · 2026-01-29T04:10:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cfzjywxk, terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Copilot · 2026-01-29T14:02:28Z

pkg/executor/analyze_col_v2.go

+LOOP:
 	for panicCnt < samplingStatsConcurrency {
-		results, ok := <-resultsCh
-		if !ok {
-			break
-		}
-		if results.Err != nil {
-			err = results.Err
-			statsHandle.FinishAnalyzeJob(results.Job, err, statistics.TableAnalysisJob)
-			if isAnalyzeWorkerPanic(err) {
-				panicCnt++
+		select {
+		case results, ok := <-resultsCh:
+			if !ok {
+				break LOOP
 			}
-			continue
+			if results.Err != nil {
+				err = results.Err
+				statsHandle.FinishAnalyzeJob(results.Job, err, statistics.TableAnalysisJob)
+				if isAnalyzeWorkerPanic(err) {
+					panicCnt++
+				}
+				continue LOOP
+			}
+			statsHandle.FinishAnalyzeJob(results.Job, nil, statistics.TableAnalysisJob)
+			totalResult.results[results.Ars[0].Hist[0].ID] = results
+		case <-ctx.Done():
+			err = context.Cause(ctx)
+			if err == nil {
+				err = ctx.Err()
+			}
+			break LOOP
 		}


In handleNDVForSpecialIndexes, the new <-ctx.Done() branch breaks out of the result-consumption loop without finishing all sub-index analyze jobs. Because jobs are inserted up-front (AddNewAnalyzeJob) and only finished when their result is read, an early break can leave rows in mysql.analyze_jobs stuck in pending/running for the remaining tasks. Consider removing the early break and letting the loop drain resultsCh until it is closed (the workers should return quickly because analyzeIndexNDVPushDown now uses the same ctx), or explicitly finishing any remaining jobs with the ctx error before returning.

Copilot · 2026-01-29T14:02:28Z

pkg/executor/analyze.go

+			}
+		}
+	}()
+	return ctx, func() {


buildAnalyzeKillCtx creates a cancelable context but the returned stop() only closes stopCh; it never calls the context's cancel function. Calling cancel (e.g., cancel(nil)) in stop() helps release context resources and ensures any ctx-derived work that might still be observing ctx.Done() can terminate promptly if stop() is invoked.

Suggested change

return ctx, func() {

return ctx, func() {

cancel(nil)

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

ti-chi-bot · 2026-01-29T18:31:30Z

@hawkingrei: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-unit-test-next-gen	`ed61b41`	link	true	`/test pull-unit-test-next-gen`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 25, 2025

hawkingrei changed the title ~~executor: fix analyze cannot be killed~~ [WIP]executor: fix analyze cannot be killed Dec 25, 2025

ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 25, 2025

ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 31, 2025

hawkingrei changed the title ~~[WIP]executor: fix analyze cannot be killed~~ executor: fix analyze cannot be killed Jan 26, 2026

ti-chi-bot bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-tests-checked labels Jan 26, 2026

hawkingrei force-pushed the fix_cannot_kill_analyze branch from e93e052 to 253f0d7 Compare January 26, 2026 08:52

Copilot AI review requested due to automatic review settings January 26, 2026 08:52

Copilot started reviewing on behalf of hawkingrei January 26, 2026 08:52 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 26, 2026

Copilot AI review requested due to automatic review settings January 26, 2026 10:16

Copilot started reviewing on behalf of hawkingrei January 26, 2026 10:16 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

ti-chi-bot bot removed the do-not-merge/needs-linked-issue label Jan 26, 2026

Copilot AI review requested due to automatic review settings January 26, 2026 13:08

Copilot started reviewing on behalf of hawkingrei January 26, 2026 13:08 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

pkg/executor/analyze_col_v2.go Outdated Show resolved Hide resolved

pkg/util/sqlkiller/sqlkiller.go Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings January 26, 2026 13:54

Copilot started reviewing on behalf of hawkingrei January 26, 2026 13:55 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

statistics: set analyze start_time on early kill

6c00c38

ti-chi-bot bot added component/statistics sig/planner SIG: Planner labels Jan 29, 2026

hawkingrei added 15 commits January 29, 2026 12:41

executor: refresh stats after auto analyze in kill tests

6d079c8

executor: use SQLKiller error in analyze kill ctx

4ce1787

executor: dedup analyze cancel errors

b1c989f

executor: make analyze slow failpoints cancelable

8a65154

executor: propagate analyze cancel cause in merge worker

fcec32f

executor: propagate analyze cancel cause from ctx

2f2151e

executor: map analyze canceled err to cause

c230cba

executor: use std errors in analyze utils

db82f8b

update

7ce1fb6

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

update

51a8a9a

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

executor: wrap analyze cancel debug log

f3d22dd

executor: wrap distsql cancel log in analyze col

d956ee8

executor: wrap analyze v2 cancel debug logs

2cd97ec

executor: wrap analyze index cancel debug logs

9692d53

update

75db65a

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

hawkingrei requested a review from Copilot January 29, 2026 13:52

Copilot started reviewing on behalf of hawkingrei January 29, 2026 13:52 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

hawkingrei added 7 commits January 29, 2026 22:49

executor: map NextRaw cancel to cause

39d5aaf

executor: rename ctx cancel normalizer

74e8483

executor: finish pending ndv analyze jobs on cancel

eb9bccc

executor: cleanup ndv merge collector on early exit

f0f54c1

executor: comment cleanup collector on early exit

cb4c60a

update

8a79bad

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

update

ed61b41

Signed-off-by: Weizhen Wang <wangweizhen@pingcap.com>

executor: fix analyze cannot be killed #65249

Are you sure you want to change the base?

executor: fix analyze cannot be killed #65249

Uh oh!

Conversation

hawkingrei commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

hawkingrei commented Dec 25, 2025

Uh oh!

codecov bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hawkingrei commented Dec 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

hawkingrei commented Dec 25, 2025 •

edited

Loading

codecov bot commented Dec 25, 2025 •

edited

Loading

ti-chi-bot bot commented Jan 29, 2026 •

edited

Loading