Update packages by ewianda · Pull Request #19 · benchsci/rules_python_gazelle

ewianda · 2026-02-20T22:27:11Z

Optimize Python parser: 3.5x speedup (46.7s → 13.3s)
Migrate the Python file parser from recursive AST traversal to
pre-compiled tree-sitter queries, add object pooling, and batch-parse
all files into a lookup table to eliminate redundant work.

Changes:

file_parser.go: Replace recursive parse() with query-based approach
using pre-compiled tree-sitter queries (init-time), sync.Pool for
parser and cursor reuse, and TYPE_CHECKING block detection.
parser.go: Add parseAllToLUT() for concurrent batch parsing with
errgroup limited to NumCPU, and parseFromLUT() to process pre-parsed
results without re-parsing.
generate.go: Parse all .py files once upfront into a LUT before
target generation. Replace per-target parser.parse() calls with
parseFromLUT() lookups. Replace per-call regexp.MustCompile in
isDjangoTestFile with strings.Contains.

Profiled on production codebase (cpu pprof):

Metric	Before	After	Improvement
Wall time	46.72s	13.25s	3.5x faster
CPU samples	53.04s	31.69s	40% less
cgocall	28.13s	23.68s	16% less

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Copilot

Pull request overview

This PR modernizes and accelerates Python dependency parsing in the Gazelle extension by switching from recursive AST walking to pre-compiled tree-sitter queries, pooling parser/cursor objects, and parsing all Python files once into a shared lookup table (LUT) that is reused during rule generation.

Changes:

Add concurrent batch parsing (parseAllToLUT) and LUT-based consumption (parseFromLUT) to avoid repeated parsing across multiple targets.
Rewrite FileParser.Parse to use pre-compiled tree-sitter queries plus pooling for parsers/cursors, including TYPE_CHECKING-block handling.
Update rule generation to pre-parse all .py files into a LUT and replace regex-based Django test detection with substring checks.

Reviewed changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/parser.go	Introduces LUT-based concurrent parsing and LUT consumption APIs.
python/generate.go	Parses all Python files once into a LUT; switches per-target parsing to LUT lookups; simplifies Django marker detection.
python/file_parser.go	Replaces recursive traversal with query-based extraction and adds pooling + TYPE_CHECKING handling.
patches/go-tree-sitter.diff	Removes the previously maintained patch file for go-tree-sitter vendoring/patching.
go.mod	Bumps Go version and updates dependency versions (Gazelle, rules_go, x/sync, etc.).
go.sum	Updates module checksums to match the new dependency set.
MODULE.bazel	Updates Bazel module deps and removes custom http_archive patching for go-tree-sitter, relying on go_deps instead.
MODULE.bazel.lock	Regenerates lockfile reflecting updated Bazel deps and transitive graph.

Comments suppressed due to low confidence (2)

python/generate.go:479

After the scan loop, scanner.Err() isn’t checked. If scanning fails (including bufio.ErrTooLong for long lines), this will silently return false and can misclassify Django tests. Consider checking scanner.Err() and handling/reporting the error consistently with the rest of this function.

	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.Contains(line, "pytest.mark.django_db") {
			return true
		}
		if strings.Contains(line, "gazelle: django_test") {
			return true
		}
		if strings.Contains(line, "django.test") && strings.Contains(line, "TestCase") {
			return true
		}
	}
	return false

python/generate.go:464

log.Fatalf already terminates the process, so the subsequent panic(err) is unreachable and can be removed for clarity.

	file, err := os.Open(path)
	if err != nil {
		log.Fatalf("ERROR: %v\n", err)
		panic(err)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/file_parser.go

Copilot · 2026-03-06T15:45:57Z

@ewianda I've opened a new pull request, #20, to work on those changes. Once the pull request is ready, I'll request review from you.

- parseMain: add ChildCount and nil checks before indexing into comparison operator children to prevent panics on malformed ASTs - parseImportStatements: guard import_from_statement with ChildCount check to handle incomplete/invalid syntax safely - Add TestParseTypeChecking covering TYPE_CHECKING, typing.TYPE_CHECKING, and non-TYPE_CHECKING conditional import blocks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

python/generate.go:479

isDjangoTestFile ignores scanner.Err() after the scan loop, so I/O errors while reading the file will be silently treated as “not a django test”. Check scanner.Err() and handle it (e.g., return false with logging, or propagate/terminate consistently with the rest of this package).

	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.Contains(line, "pytest.mark.django_db") {
			return true
		}
		if strings.Contains(line, "gazelle: django_test") {
			return true
		}
		if strings.Contains(line, "django.test") && strings.Contains(line, "TestCase") {
			return true
		}
	}
	return false

python/file_parser.go:99

parseCode always calls parser.ParseCtx(context.Background(), ...), so cancellations/timeouts passed into FileParser.Parse(ctx) won’t stop the expensive tree-sitter parse step. Consider threading ctx through parseCode and using it in ParseCtx so batch parsing can be canceled promptly on errors/timeouts.

func parseCode(code []byte) (*sitter.Node, error) {
	parser := parserPool.Get().(*sitter.Parser)
	defer parserPool.Put(parser)

	tree, err := parser.ParseCtx(context.Background(), nil, code)
	if err != nil {
		return nil, err

python/generate.go:464

log.Fatalf already terminates the process, so the subsequent panic(err) is dead code and can be removed. If you want to return an error instead of exiting, replace Fatalf with error propagation.

	file, err := os.Open(path)
	if err != nil {
		log.Fatalf("ERROR: %v\n", err)
		panic(err)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/file_parser.go

Migrate the Python file parser from recursive AST traversal to pre-compiled tree-sitter queries, add object pooling, and batch-parse all files into a lookup table to eliminate redundant work. Changes: - file_parser.go: Replace recursive parse() with query-based approach using pre-compiled tree-sitter queries (init-time), sync.Pool for parser and cursor reuse, TYPE_CHECKING block detection, and defensive ChildCount/nil checks on AST node access. - parser.go: Add parseAllToLUT() for concurrent batch parsing with errgroup limited to NumCPU, and parseFromLUT() to process pre-parsed results without re-parsing. - generate.go: Parse all .py files once upfront into a LUT before target generation. Replace per-target parser.parse() calls with parseFromLUT() lookups. Replace per-call regexp.MustCompile in isDjangoTestFile with strings.Contains. - file_parser_test.go: Add TYPE_CHECKING test coverage for bare TYPE_CHECKING, typing.TYPE_CHECKING, and non-TYPE_CHECKING blocks. Profiled on production codebase (cpu pprof): | Metric | Before | After | Improvement | |-------------|---------|---------|-------------| | Wall time | 46.72s | 13.25s | 3.5x faster | | CPU samples | 53.04s | 31.69s | 40% less | | cgocall | 28.13s | 23.68s | 16% less | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update packages

e66ec94

ewianda requested a review from Copilot February 20, 2026 22:27

Copilot started reviewing on behalf of ewianda February 20, 2026 22:28 View session

ewianda force-pushed the Update-packages branch 2 times, most recently from e408931 to b574ffb Compare February 20, 2026 22:33

Copilot AI reviewed Feb 20, 2026

View reviewed changes

python/file_parser.go Outdated Show resolved Hide resolved

python/file_parser.go Show resolved Hide resolved

python/file_parser.go Show resolved Hide resolved

Copilot AI mentioned this pull request Mar 6, 2026

Add defensive ChildCount() guards in parseMain to prevent AST index-out-of-bounds panics #20

Closed

ewianda requested a review from Copilot March 9, 2026 04:24

Copilot started reviewing on behalf of ewianda March 9, 2026 04:25 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

python/file_parser.go Show resolved Hide resolved

ewianda force-pushed the Update-packages branch from 3d44d01 to a0f872a Compare March 9, 2026 04:36

ewianda merged commit aa24131 into main Mar 10, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update packages#19

Update packages#19
ewianda merged 2 commits intomainfrom
Update-packages

ewianda commented Feb 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

ewianda commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

ewianda commented Feb 20, 2026 •

edited

Loading