Skip to content

perf: Compose dependency symbols on demand#222

Merged
jupblb merged 3 commits intoscip-code:mainfrom
prathshenoy:on-demand
Apr 29, 2026
Merged

perf: Compose dependency symbols on demand#222
jupblb merged 3 commits intoscip-code:mainfrom
prathshenoy:on-demand

Conversation

@prathshenoy
Copy link
Copy Markdown
Contributor

At Uber, we use scip-go to index one of the largest Go monorepos in the world and found that package loading was the dominant cost in both time and space. The current implementation naively forces the loader to fully type-check every transitive dependency, even though only exported type information is needed and is already available.

This change switches dependency loading to use precompiled type export data instead of re-parsing source files. Symbols for dependency packages are now composed on demand from type metadata at resolution time, rather than being constructed by traversing dependency syntax trees. Implementation relationship extraction is also split into two paths: one for project packages, which have full source and type information, and another for dependencies, which rely only on type metadata.

These improvements significantly reduce both memory usage (~50-70%) and indexing time for packages in large repositories.

@jupblb jupblb self-requested a review April 20, 2026 13:26
@jupblb
Copy link
Copy Markdown
Collaborator

jupblb commented Apr 20, 2026

I've changed the repo configuration. The GitHub actions should run automatically now. :)

@jupblb
Copy link
Copy Markdown
Collaborator

jupblb commented Apr 20, 2026

@prathshenoy Could you please give me examples of OSS projects for which the speedup may be noticed? Right now looking at CI I see significant slowdown in indexing the root of Kubernetes project.

I've also ran a test myself on a local machine against prometheus/prometheus, kubernetes/kubernetes and hashicorp/terraform using the following script:

# scip-go-before was built from commit 9b1792410f3dc4f7791b28dce6f855eac939030d
# scip-go-after was built from commit 7f5a6793894f69eb1930eb2190e3ffa46208bbc5
rm -f index.scip
echo "=== BEFORE Run 1 ==="
go clean -cache
time /tmp/scip-go-before 2>&1

rm -f index.scip
echo "=== BEFORE Run 2 ==="
go clean -cache
time /tmp/scip-go-before 2>&1

rm -f index.scip
echo "=== AFTER Run 1 ==="
go clean -cache
time /tmp/scip-go-after 2>&1

rm -f index.scip
echo "=== AFTER Run 2 ==="
go clean -cache
time /tmp/scip-go-after 2>&1

The results are:

Project Before Run 1 Before Run 2 After Run 1 After Run 2 Avg Change
Prometheus 8.07s 7.73s 37.80s 38.10s 4.8× slower
Terraform 20.67s 19.20s 32.54s 31.49s 1.6× slower
Kubernetes 20.70s 19.50s 61.40s 60.60s 3.0× slower

@prathshenoy
Copy link
Copy Markdown
Contributor Author

@jupblb, sorry for the delayed response. Thanks for taking the time to look at this.

I realized that this new approach performs worse than the old one when the cache is cold, but significantly better when the cache is warm.

Project Binary Cold 1 Cold 2 Warm 1 Warm 2
Prometheus before 18s 17s 17s 16s
Prometheus after 62s 62s 9s 8s
Terraform before 23s 23s 18s 17s
Terraform after 47s 47s 9s 8s
Kubernetes before 45s 45s 44s 44s
Kubernetes after 89s 90s 29s 30s

Internally, this tradeoff works in our favor because we rely heavily on Bazel’s remote cache, which usually has hits for most packages. Feel free to close this PR, as this tradeoff doesn’t seem acceptable in other use cases.

@jupblb
Copy link
Copy Markdown
Collaborator

jupblb commented Apr 27, 2026

Will take a look soon. I'm definitely for at least giving a choice on CLI level to either use this changed logic or not. Running with warm cache is something that I consider to be a very reasonable use case.

Could you please rebase your changes first on top of origin/main?

Copy link
Copy Markdown
Collaborator

@jupblb jupblb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me. I don't think we need to flag-guard this new logic. There is some CPU performance penalty on a cold run but there's also actually a huge reduction of memory usage (nice!).
I'll appreciate if you can make some final polishes. When done, we can merge this PR. :)

Comment thread internal/symbols/composer.go Outdated
Comment thread internal/symbols/composer.go Outdated
Comment thread internal/symbols/composer.go Outdated
Comment thread internal/symbols/composer.go Outdated
Comment thread internal/implementations/extractor.go Outdated
Comment thread internal/testdata/snapshots/input/pr222/go.mod
@prathshenoy
Copy link
Copy Markdown
Contributor Author

prathshenoy commented Apr 28, 2026

Thanks, @jupblb! I've addressed your comments. Please let me know if you have any other concerns. We would appreciate a new release once this merges.

@prathshenoy prathshenoy requested a review from jupblb April 28, 2026 23:45
Copy link
Copy Markdown
Collaborator

@jupblb jupblb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution. Great impact! :)

@jupblb jupblb merged commit a72ee15 into scip-code:main Apr 29, 2026
10 checks passed
@prathshenoy prathshenoy deleted the on-demand branch April 29, 2026 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants