Conversation
e408931 to
b574ffb
Compare
There was a problem hiding this comment.
Pull request overview
This PR modernizes and accelerates Python dependency parsing in the Gazelle extension by switching from recursive AST walking to pre-compiled tree-sitter queries, pooling parser/cursor objects, and parsing all Python files once into a shared lookup table (LUT) that is reused during rule generation.
Changes:
- Add concurrent batch parsing (
parseAllToLUT) and LUT-based consumption (parseFromLUT) to avoid repeated parsing across multiple targets. - Rewrite
FileParser.Parseto use pre-compiled tree-sitter queries plus pooling for parsers/cursors, including TYPE_CHECKING-block handling. - Update rule generation to pre-parse all
.pyfiles into a LUT and replace regex-based Django test detection with substring checks.
Reviewed changes
Copilot reviewed 7 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| python/parser.go | Introduces LUT-based concurrent parsing and LUT consumption APIs. |
| python/generate.go | Parses all Python files once into a LUT; switches per-target parsing to LUT lookups; simplifies Django marker detection. |
| python/file_parser.go | Replaces recursive traversal with query-based extraction and adds pooling + TYPE_CHECKING handling. |
| patches/go-tree-sitter.diff | Removes the previously maintained patch file for go-tree-sitter vendoring/patching. |
| go.mod | Bumps Go version and updates dependency versions (Gazelle, rules_go, x/sync, etc.). |
| go.sum | Updates module checksums to match the new dependency set. |
| MODULE.bazel | Updates Bazel module deps and removes custom http_archive patching for go-tree-sitter, relying on go_deps instead. |
| MODULE.bazel.lock | Regenerates lockfile reflecting updated Bazel deps and transitive graph. |
Comments suppressed due to low confidence (2)
python/generate.go:479
- After the scan loop,
scanner.Err()isn’t checked. If scanning fails (includingbufio.ErrTooLongfor long lines), this will silently return false and can misclassify Django tests. Consider checkingscanner.Err()and handling/reporting the error consistently with the rest of this function.
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
if strings.Contains(line, "pytest.mark.django_db") {
return true
}
if strings.Contains(line, "gazelle: django_test") {
return true
}
if strings.Contains(line, "django.test") && strings.Contains(line, "TestCase") {
return true
}
}
return false
python/generate.go:464
log.Fatalfalready terminates the process, so the subsequentpanic(err)is unreachable and can be removed for clarity.
file, err := os.Open(path)
if err != nil {
log.Fatalf("ERROR: %v\n", err)
panic(err)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- parseMain: add ChildCount and nil checks before indexing into comparison operator children to prevent panics on malformed ASTs - parseImportStatements: guard import_from_statement with ChildCount check to handle incomplete/invalid syntax safely - Add TestParseTypeChecking covering TYPE_CHECKING, typing.TYPE_CHECKING, and non-TYPE_CHECKING conditional import blocks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 11 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
python/generate.go:479
isDjangoTestFileignoresscanner.Err()after the scan loop, so I/O errors while reading the file will be silently treated as “not a django test”. Checkscanner.Err()and handle it (e.g., return false with logging, or propagate/terminate consistently with the rest of this package).
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
if strings.Contains(line, "pytest.mark.django_db") {
return true
}
if strings.Contains(line, "gazelle: django_test") {
return true
}
if strings.Contains(line, "django.test") && strings.Contains(line, "TestCase") {
return true
}
}
return false
python/file_parser.go:99
parseCodealways callsparser.ParseCtx(context.Background(), ...), so cancellations/timeouts passed intoFileParser.Parse(ctx)won’t stop the expensive tree-sitter parse step. Consider threadingctxthroughparseCodeand using it inParseCtxso batch parsing can be canceled promptly on errors/timeouts.
func parseCode(code []byte) (*sitter.Node, error) {
parser := parserPool.Get().(*sitter.Parser)
defer parserPool.Put(parser)
tree, err := parser.ParseCtx(context.Background(), nil, code)
if err != nil {
return nil, err
python/generate.go:464
log.Fatalfalready terminates the process, so the subsequentpanic(err)is dead code and can be removed. If you want to return an error instead of exiting, replaceFatalfwith error propagation.
file, err := os.Open(path)
if err != nil {
log.Fatalf("ERROR: %v\n", err)
panic(err)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Migrate the Python file parser from recursive AST traversal to pre-compiled tree-sitter queries, add object pooling, and batch-parse all files into a lookup table to eliminate redundant work. Changes: - file_parser.go: Replace recursive parse() with query-based approach using pre-compiled tree-sitter queries (init-time), sync.Pool for parser and cursor reuse, TYPE_CHECKING block detection, and defensive ChildCount/nil checks on AST node access. - parser.go: Add parseAllToLUT() for concurrent batch parsing with errgroup limited to NumCPU, and parseFromLUT() to process pre-parsed results without re-parsing. - generate.go: Parse all .py files once upfront into a LUT before target generation. Replace per-target parser.parse() calls with parseFromLUT() lookups. Replace per-call regexp.MustCompile in isDjangoTestFile with strings.Contains. - file_parser_test.go: Add TYPE_CHECKING test coverage for bare TYPE_CHECKING, typing.TYPE_CHECKING, and non-TYPE_CHECKING blocks. Profiled on production codebase (cpu pprof): | Metric | Before | After | Improvement | |-------------|---------|---------|-------------| | Wall time | 46.72s | 13.25s | 3.5x faster | | CPU samples | 53.04s | 31.69s | 40% less | | cgocall | 28.13s | 23.68s | 16% less | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Optimize Python parser: 3.5x speedup (46.7s → 13.3s)
Migrate the Python file parser from recursive AST traversal to
pre-compiled tree-sitter queries, add object pooling, and batch-parse
all files into a lookup table to eliminate redundant work.
Changes:
using pre-compiled tree-sitter queries (init-time), sync.Pool for
parser and cursor reuse, and TYPE_CHECKING block detection.
errgroup limited to NumCPU, and parseFromLUT() to process pre-parsed
results without re-parsing.
target generation. Replace per-target parser.parse() calls with
parseFromLUT() lookups. Replace per-call regexp.MustCompile in
isDjangoTestFile with strings.Contains.
Profiled on production codebase (cpu pprof):
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com