Skip to content

Update packages#19

Merged
ewianda merged 2 commits intomainfrom
Update-packages
Mar 10, 2026
Merged

Update packages#19
ewianda merged 2 commits intomainfrom
Update-packages

Conversation

@ewianda
Copy link
Collaborator

@ewianda ewianda commented Feb 20, 2026

Optimize Python parser: 3.5x speedup (46.7s → 13.3s)
Migrate the Python file parser from recursive AST traversal to
pre-compiled tree-sitter queries, add object pooling, and batch-parse
all files into a lookup table to eliminate redundant work.

Changes:

  • file_parser.go: Replace recursive parse() with query-based approach
    using pre-compiled tree-sitter queries (init-time), sync.Pool for
    parser and cursor reuse, and TYPE_CHECKING block detection.
  • parser.go: Add parseAllToLUT() for concurrent batch parsing with
    errgroup limited to NumCPU, and parseFromLUT() to process pre-parsed
    results without re-parsing.
  • generate.go: Parse all .py files once upfront into a LUT before
    target generation. Replace per-target parser.parse() calls with
    parseFromLUT() lookups. Replace per-call regexp.MustCompile in
    isDjangoTestFile with strings.Contains.

Profiled on production codebase (cpu pprof):

Metric Before After Improvement
Wall time 46.72s 13.25s 3.5x faster
CPU samples 53.04s 31.69s 40% less
cgocall 28.13s 23.68s 16% less

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

@ewianda ewianda requested a review from Copilot February 20, 2026 22:27
@ewianda ewianda force-pushed the Update-packages branch 2 times, most recently from e408931 to b574ffb Compare February 20, 2026 22:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes and accelerates Python dependency parsing in the Gazelle extension by switching from recursive AST walking to pre-compiled tree-sitter queries, pooling parser/cursor objects, and parsing all Python files once into a shared lookup table (LUT) that is reused during rule generation.

Changes:

  • Add concurrent batch parsing (parseAllToLUT) and LUT-based consumption (parseFromLUT) to avoid repeated parsing across multiple targets.
  • Rewrite FileParser.Parse to use pre-compiled tree-sitter queries plus pooling for parsers/cursors, including TYPE_CHECKING-block handling.
  • Update rule generation to pre-parse all .py files into a LUT and replace regex-based Django test detection with substring checks.

Reviewed changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/parser.go Introduces LUT-based concurrent parsing and LUT consumption APIs.
python/generate.go Parses all Python files once into a LUT; switches per-target parsing to LUT lookups; simplifies Django marker detection.
python/file_parser.go Replaces recursive traversal with query-based extraction and adds pooling + TYPE_CHECKING handling.
patches/go-tree-sitter.diff Removes the previously maintained patch file for go-tree-sitter vendoring/patching.
go.mod Bumps Go version and updates dependency versions (Gazelle, rules_go, x/sync, etc.).
go.sum Updates module checksums to match the new dependency set.
MODULE.bazel Updates Bazel module deps and removes custom http_archive patching for go-tree-sitter, relying on go_deps instead.
MODULE.bazel.lock Regenerates lockfile reflecting updated Bazel deps and transitive graph.
Comments suppressed due to low confidence (2)

python/generate.go:479

  • After the scan loop, scanner.Err() isn’t checked. If scanning fails (including bufio.ErrTooLong for long lines), this will silently return false and can misclassify Django tests. Consider checking scanner.Err() and handling/reporting the error consistently with the rest of this function.
	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.Contains(line, "pytest.mark.django_db") {
			return true
		}
		if strings.Contains(line, "gazelle: django_test") {
			return true
		}
		if strings.Contains(line, "django.test") && strings.Contains(line, "TestCase") {
			return true
		}
	}
	return false

python/generate.go:464

  • log.Fatalf already terminates the process, so the subsequent panic(err) is unreachable and can be removed for clarity.
	file, err := os.Open(path)
	if err != nil {
		log.Fatalf("ERROR: %v\n", err)
		panic(err)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Mar 6, 2026

@ewianda I've opened a new pull request, #20, to work on those changes. Once the pull request is ready, I'll request review from you.

ewianda added a commit that referenced this pull request Mar 9, 2026
- parseMain: add ChildCount and nil checks before indexing into
  comparison operator children to prevent panics on malformed ASTs
- parseImportStatements: guard import_from_statement with ChildCount
  check to handle incomplete/invalid syntax safely
- Add TestParseTypeChecking covering TYPE_CHECKING, typing.TYPE_CHECKING,
  and non-TYPE_CHECKING conditional import blocks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ewianda ewianda requested a review from Copilot March 9, 2026 04:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 11 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

python/generate.go:479

  • isDjangoTestFile ignores scanner.Err() after the scan loop, so I/O errors while reading the file will be silently treated as “not a django test”. Check scanner.Err() and handle it (e.g., return false with logging, or propagate/terminate consistently with the rest of this package).
	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.Contains(line, "pytest.mark.django_db") {
			return true
		}
		if strings.Contains(line, "gazelle: django_test") {
			return true
		}
		if strings.Contains(line, "django.test") && strings.Contains(line, "TestCase") {
			return true
		}
	}
	return false

python/file_parser.go:99

  • parseCode always calls parser.ParseCtx(context.Background(), ...), so cancellations/timeouts passed into FileParser.Parse(ctx) won’t stop the expensive tree-sitter parse step. Consider threading ctx through parseCode and using it in ParseCtx so batch parsing can be canceled promptly on errors/timeouts.
func parseCode(code []byte) (*sitter.Node, error) {
	parser := parserPool.Get().(*sitter.Parser)
	defer parserPool.Put(parser)

	tree, err := parser.ParseCtx(context.Background(), nil, code)
	if err != nil {
		return nil, err

python/generate.go:464

  • log.Fatalf already terminates the process, so the subsequent panic(err) is dead code and can be removed. If you want to return an error instead of exiting, replace Fatalf with error propagation.
	file, err := os.Open(path)
	if err != nil {
		log.Fatalf("ERROR: %v\n", err)
		panic(err)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Migrate the Python file parser from recursive AST traversal to
pre-compiled tree-sitter queries, add object pooling, and batch-parse
all files into a lookup table to eliminate redundant work.

Changes:
- file_parser.go: Replace recursive parse() with query-based approach
  using pre-compiled tree-sitter queries (init-time), sync.Pool for
  parser and cursor reuse, TYPE_CHECKING block detection, and defensive
  ChildCount/nil checks on AST node access.
- parser.go: Add parseAllToLUT() for concurrent batch parsing with
  errgroup limited to NumCPU, and parseFromLUT() to process pre-parsed
  results without re-parsing.
- generate.go: Parse all .py files once upfront into a LUT before
  target generation. Replace per-target parser.parse() calls with
  parseFromLUT() lookups. Replace per-call regexp.MustCompile in
  isDjangoTestFile with strings.Contains.
- file_parser_test.go: Add TYPE_CHECKING test coverage for bare
  TYPE_CHECKING, typing.TYPE_CHECKING, and non-TYPE_CHECKING blocks.

Profiled on production codebase (cpu pprof):

  | Metric      | Before  | After   | Improvement |
  |-------------|---------|---------|-------------|
  | Wall time   | 46.72s  | 13.25s  | 3.5x faster |
  | CPU samples | 53.04s  | 31.69s  | 40% less    |
  | cgocall     | 28.13s  | 23.68s  | 16% less    |

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ewianda ewianda merged commit aa24131 into main Mar 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants