perf: Compose dependency symbols on demand#222
Conversation
|
I've changed the repo configuration. The GitHub actions should run automatically now. :) |
|
@prathshenoy Could you please give me examples of OSS projects for which the speedup may be noticed? Right now looking at CI I see significant slowdown in indexing the root of Kubernetes project. I've also ran a test myself on a local machine against prometheus/prometheus, kubernetes/kubernetes and hashicorp/terraform using the following script: # scip-go-before was built from commit 9b1792410f3dc4f7791b28dce6f855eac939030d
# scip-go-after was built from commit 7f5a6793894f69eb1930eb2190e3ffa46208bbc5
rm -f index.scip
echo "=== BEFORE Run 1 ==="
go clean -cache
time /tmp/scip-go-before 2>&1
rm -f index.scip
echo "=== BEFORE Run 2 ==="
go clean -cache
time /tmp/scip-go-before 2>&1
rm -f index.scip
echo "=== AFTER Run 1 ==="
go clean -cache
time /tmp/scip-go-after 2>&1
rm -f index.scip
echo "=== AFTER Run 2 ==="
go clean -cache
time /tmp/scip-go-after 2>&1The results are:
|
|
@jupblb, sorry for the delayed response. Thanks for taking the time to look at this. I realized that this new approach performs worse than the old one when the cache is cold, but significantly better when the cache is warm.
Internally, this tradeoff works in our favor because we rely heavily on Bazel’s remote cache, which usually has hits for most packages. Feel free to close this PR, as this tradeoff doesn’t seem acceptable in other use cases. |
|
Will take a look soon. I'm definitely for at least giving a choice on CLI level to either use this changed logic or not. Running with warm cache is something that I consider to be a very reasonable use case. Could you please rebase your changes first on top of |
There was a problem hiding this comment.
Overall, this looks good to me. I don't think we need to flag-guard this new logic. There is some CPU performance penalty on a cold run but there's also actually a huge reduction of memory usage (nice!).
I'll appreciate if you can make some final polishes. When done, we can merge this PR. :)
|
Thanks, @jupblb! I've addressed your comments. Please let me know if you have any other concerns. We would appreciate a new release once this merges. |
jupblb
left a comment
There was a problem hiding this comment.
Thank you for this contribution. Great impact! :)
At Uber, we use scip-go to index one of the largest Go monorepos in the world and found that package loading was the dominant cost in both time and space. The current implementation naively forces the loader to fully type-check every transitive dependency, even though only exported type information is needed and is already available.
This change switches dependency loading to use precompiled type export data instead of re-parsing source files. Symbols for dependency packages are now composed on demand from type metadata at resolution time, rather than being constructed by traversing dependency syntax trees. Implementation relationship extraction is also split into two paths: one for project packages, which have full source and type information, and another for dependencies, which rely only on type metadata.
These improvements significantly reduce both memory usage (~50-70%) and indexing time for packages in large repositories.