Skip to content

Document the uplc memory leak#668

Open
Eh2406 wants to merge 6 commits intopragma-org:mainfrom
Eh2406:edr-20
Open

Document the uplc memory leak#668
Eh2406 wants to merge 6 commits intopragma-org:mainfrom
Eh2406:edr-20

Conversation

@Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Feb 3, 2026

We found ourselves in an architecturally awkward situation with Bump in uplc. Let's document some of the decisions and implications.

Summary by CodeRabbit

  • Documentation
    • Added an architecture decision record detailing approaches for managing DAG-like data, recommending a bump-based arena allocator for certain contexts and outlining trade-offs (fast allocation/deallocation vs. skipped drops and potential leaks for some heap-stored values).
    • Introduces conceptual types (Arena, AppendOnlyVec), cross-thread safety considerations, and notes exploration of large-integer crate compatibility.

Signed-off-by: Jacob Finkelman <YeomanYaacov@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Warning

Rate limit exceeded

@KtorZ has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 27 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

An ADR was added that documents strategies for managing DAG-like data in Rust/UPLC contexts, comparing Rc, DB-backed references, index-based Vecs, petgraph, Slab/Slotmap, and bumpalo arenas, and detailing lifetime, drop, and large-integer allocator considerations.

Changes

Cohort / File(s) Summary
Architecture Decision Record
engineering-decision-records/020-bump-arenas-and-workarounds.md
New ADR describing memory management approaches for DAG-like data in Rust/UPLC: Rc, database-backed refs, Vec indices, petgraph, Slab/Slotmap, and bumpalo::Bump; discusses lifetime and drop semantics, integer allocation implications, AppendOnlyVec/Arena types, cross-thread safety notes, and large-integer allocator considerations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🪄 An ADR penned under neon light,
From Rc to Bump, we pick our fight,
Arenas hum and integers hide,
Safe-ish unsafe makes memory glide,
Cheers — the plan's set, let’s ship it right.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Document the uplc memory leak' accurately reflects the main change—adding an ADR document that explains the memory leak situation and architectural decisions in uplc.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Feb 3, 2026

@KtorZ, Is this basically what you had in mind? If not to free to make changes both large and small.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@engineering-decision-records/020-bump-arenas-and-workarounds.md`:
- Line 22: The sentence contains a duplicated "is" around the crate name; update
the sentence referencing `nub-bigint` to remove the extra "is" and improve flow
(for example: "Our preferred large integer crate `nub-bigint` is probably only
interested in adding compatibility with custom allocators once std stabilizes
the API."). Ensure the crate name `nub-bigint` remains intact and the revised
sentence reads naturally.
- Line 18: Fix the grammar in the sentence describing integer drop behavior:
change "relying on the drop implementation free that backing data" to a correct
phrasing such as "relying on the drop implementation to free that backing data"
(or "relying on the drop implementation to free the backing data") in the
paragraph that references the integer type, its separate heap allocation, and
bumpalo::Bump so the description about drop behavior and the resulting memory
leak reads correctly.
- Line 10: Replace the typo "usege" with "usage" and hyphenate compound
adjectives used before nouns: change "garbage collected languages" to
"garbage-collected languages" and "user provided" to "user-provided" throughout
the document (search for the exact tokens "usege", "garbage collected", and
"user provided" and update them in the wording around the DAG/compatibility
explanation and later references).
- Line 20: Fix the typo "alterintive" to "alternative" in the sentence
describing AppendOnlyVec in the ADR: locate the paragraph that references
uplc-pr-36 and the types Arena, AppendOnlyVec, Bump, Slab, and Slotmap and
replace "alterintive" with the correct word "alternative" so the description
reads "...uses `AppendOnlyVec`, an alternative to `Slab` or `Slotmap`."
🧹 Nitpick comments (1)
engineering-decision-records/020-bump-arenas-and-workarounds.md (1)

14-14: That sentence is longer than a Tarantino monologue, mate!

Line 14 is doing some serious heavy lifting - it's covering Rc, databases, Vec indexing, petgraph, Slab, Slotmap, and uplc all in one breath. Breaking this up into a few shorter sentences would make it way easier to digest, especially for folks coming back to this doc later.

♻️ Suggested restructuring for readability

Consider breaking this into 3-4 sentences focusing on:

  1. Basic approaches (Rc, database-backed)
  2. Vec-based solutions and when they work
  3. The uplc-specific context and why Bump works best

Example:

We have several tools to deal with the situation. The most straightforward is to use `Rc` - this is obviously the right place to start prototyping but can become a performance bottleneck. Another alternative is to store data in a database and replace references to other nodes by the unique key of that row in the database. This can be appropriate when the data is long-lived and accessed on demand, but it's a terrible choice for short-lived, small, quickly accessed items.

You can also store the data in a `Vec` and replace pointers with indexes into the data structure. This works well when all your data is of the same type and there are clear patterns to where new entries are added. If you need to run generic graph algorithms on data stored in this format, `petgraph` is a good choice. Similarly, if you need to ensure IDs are only used in the correct data structure, `Slab` or `Slotmap` may work well.

For `uplc`, user-provided programs can create large amounts of heterogeneous-type data that only get accessed by the rest of their program. The best solution for this context was a `bumpalo::Bump`, which allows accessing entries using normal references through carefully written unsafe code, storing new items with only a shared reference, and very fast deallocation.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@engineering-decision-records/020-bump-arenas-and-workarounds.md`:
- Line 14: The sentence fragment "make sure that he is our only used in the
correct data structure Slab or Slotmap may work well" is garbled; update that
clause to a clear, concise sentence such as "Similarly, if you need to ensure
values have a single owner and are stored in the correct data structure, Slab or
Slotmap may work well." Locate the paragraph that mentions Rc, Vec, petgraph and
bumpalo and replace the broken phrase with the revised wording so the intent
about single ownership and appropriate use of Slab/Slotmap is explicit.
- Line 20: The sentence starting "In uplc-pr-36 adds a New Type for
`bumpalo::Bump` called `Arena`" is missing a subject; change it to include an
explicit subject such as "The PR" or "In PR `uplc-pr-36`" so it reads e.g. "In
PR `uplc-pr-36`, the author adds a new type for `bumpalo::Bump` called `Arena`,"
keeping the rest of the paragraph (references to `Arena`, `AppendOnlyVec`,
`Slab`, `Slotmap`, and `Bump`) unchanged.


Unfortunately, the very fast deallocation is achieved by ignoring any drop code required by the underlying type. Most types they get stored in a user provided program as processed by `uplc` either don't own data or only own data that itself is stored in the `Bump` arena. So generally this limitation turns out not to be a big deal. The exception is integers, which need to store their data on the heap because their specification does not mandate a largest potential value. The integer type stores its data in a separate allocation in the heap, relying on the drop implementation free that backing data. We put that integer in a `bumpalo::Bump`. Which does not run drop on its contents. The result is a pretty dramatic memory leak.

In [uplc-pr-36](https://github.com/pragma-org/uplc/pull/36) adds a New Type for `bumpalo::Bump` called `Arena`. This provides a place to handle more complicated logic. The PR also uses `AppendOnlyVec`, a alterintive to `Slab` or `Slotmap`. It is similar to `Bump` in that you can push/alloc with only a shared reference, and in that it never moves items once they've been allocated so it can have long lived shared references to items. It differs in that it can only store one type. It has extra overhead in that it can safely be used across multiple threads, which is functionality we do not need.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the missing subject in the “uplc-pr-36” sentence.

Right now it reads like the opening line of a cutscene with the protagonist missing. Give it a subject and it’ll flow properly.

🎬 Suggested tweak
-In [uplc-pr-36](https://github.com/pragma-org/uplc/pull/36) adds a New Type for `bumpalo::Bump` called `Arena`.
+[uplc-pr-36](https://github.com/pragma-org/uplc/pull/36) adds a new type for `bumpalo::Bump` called `Arena`.
🧰 Tools
🪛 LanguageTool

[grammar] ~20-~20: Use a hyphen to join words.
Context: ...ey've been allocated so it can have long lived shared references to items. It dif...

(QB_NEW_EN_HYPHEN)


[style] ~20-~20: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ers in that it can only store one type. It has extra overhead in that it can safel...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🤖 Prompt for AI Agents
In `@engineering-decision-records/020-bump-arenas-and-workarounds.md` at line 20,
The sentence starting "In uplc-pr-36 adds a New Type for `bumpalo::Bump` called
`Arena`" is missing a subject; change it to include an explicit subject such as
"The PR" or "In PR `uplc-pr-36`" so it reads e.g. "In PR `uplc-pr-36`, the
author adds a new type for `bumpalo::Bump` called `Arena`," keeping the rest of
the paragraph (references to `Arena`, `AppendOnlyVec`, `Slab`, `Slotmap`, and
`Bump`) unchanged.


## Decision

We have several tools to deal with the situation. The most straightforward is to use `Rc`, this is obviously the right place to start prototyping but can become a performance bottleneck. Another alternative is to store data in a database and replace references to other nodes by the unique key of that row in the database. This can be appropriate when the data is long-lived and accessed on demand. It is a terrible choice for short-lived, small, quickly accessed items. You can store the data in a `Vec` and replace pointers with indexes into the data structure. This can work well as long as all of your data is of the same type, and there's fairly clear patterns to where new entries are added. If you need to run generic graph algorithms on data stored in this format `petgraph` is a good choice. Similarly if you need to make sure that he is our only used in the correct data structure `Slab` or `Slotmap` may work well. For `uplc` the user provided programs can create large amounts of data of heterogeneous type that only get accessed by the rest of their program. The best solution for this context was a `bumpalo::Bump`. By use of carefully written unsafe code it allows accessing entries using normal references, and storing new items with only a shared reference, and very fast deallocation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great explanation, though I feel like a lot of this should be under "discussion points" as it constitutes the thought process that led to the decision: use an append-only vec of integers alongside a bumpalo::Bump.

Also, it seems like this describes the first iterations of that, not the one we ended up committing to?

"By use of carefully written unsafe code it allows accessing entries using normal references" -> we don't use unsafe in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to the EDR format. If content should go in different sections feel free to move it.

we don't use unsafe in the end.

Right, we do not use unsafe. bumpalo::Bump does. I seem to have not been clear about the object of that sentence.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants