Concurrent Immix by tianleq · Pull Request #1355 · mmtk/mmtk-core

tianleq · 2025-07-28T01:54:06Z

We add a concurrent Immix plan. It can do concurrent marking (non-moving), and falls back to stop-the-world Immix collection with opportunistic defragmentation.

We add a snap-at-the-beginning barrier to support concurrent Immix.

Now the plans control when to clear side unlog bits. This allows the same policy (specifically the ImmixSpace) to be used by different plans and clear the unlog bits at different time.

…-immix

wks · 2025-07-29T04:21:04Z

What will happen if a mutator wants to fork() while concurrent marking is in progress? There are subtle interactions between preparing for forking (MMTK::prepare_to_fork), the concurrently running marking task (including the dirty mark bits, etc.), and the WorkerGoal which is currently only designed for triggering GC and triggering "prepare-to-fork".

We may temporarily tell the VM binding that MMTk doesn't currently support forking when using concurrent GC. We may fix it later. One simple solution is postponing the forking until the current GC finishes.

The status quo is that only CRuby and Android need forking. But CRuby will not support concurrent GC in a short term.

k-sareen · 2025-07-29T04:24:18Z

What will happen if a mutator wants to fork() while concurrent marking is in progress?

You can't let this happen. Either the binding needs to ensure that the mutator waits while concurrent marking is active, or you don't let a concurrent GC happen before forking (ART's method of dealing with this).

wks · 2025-07-29T04:48:28Z

You can't let this happen. Either the binding needs to ensure that the mutator waits while concurrent marking is active, or you don't let a concurrent GC happen before forking (ART's method of dealing with this).

Agreed. Fortunately, the current API doc for prepare_to_fork says:

    /// This function sends an asynchronous message to GC threads and returns immediately, but it
    /// is only safe for the VM to call `fork()` after the underlying **native threads** of the GC
    /// threads have exited.  After calling this function, the VM should wait for their underlying
    /// native threads to exit in VM-specific manner before calling `fork()`.

So a well-behaving VM binding shall wait for all the GC worker threads (which are created by the binding via VMCollection::spawn_gc_thread anyway) to exit before calling fork(). That's VM-specific, but not hard. Extending this API to support concurrent GC should not require the VM binding to rewrite this part.

k-sareen · 2025-07-29T06:39:22Z

Mentioning it before I forget: we need to change the name for the GC counter to say it's pauses or have a separate counter that counts pauses.

wks

The current "WorkerGoal" mechanism should be able to handle the case where mutators trigger another GC between InitialMark and FinalMark. We can remove WorkPacketStage::Initial and WorkPacketStage::ConcurrentSentinel, and use GCWorkScheduler::on_last_parked to transition the GC state from InitialMark to the concurrent marking to FinalMark and finally finish the GC. See inline comments for more details.

wks · 2025-07-30T07:17:16Z

It looks like we keep newly allocated objects alive by marking the lines, ~~but not marking those objects~~ (actually the mark bits of all words in a line are set). This is OK. But what if we need VO bits? The default strategy for maintaining VO bits is copying from the mark bits. Since ~~those objects are not marked~~ all words of the line are marked, ~~their VO bits will be zero~~ the VO bits will also be all ones if we blindly copy VO bits from the mark bits. But because post_alloc already sets the VO bits, we just need to keep their VO bits as is.

To properly support VO bits, we need to do two things:

Reset the BumpPointer of ImmixAllocator so that after the InitialMark, all mutators start allocation from empty lines (i.e. make sure a line never contains both objects allocated before InitialMark and objects allocated after InitialMark).
In Block::sweep,
- if a line only contains objects allocated after InitialMark, we keep the VO bits as is;
- if it only contains live objects before InitialMark, we copy the VO bits over from mark bits;
- otherwise the line must be empty. We clear its VO bits.

There are multiple ways to know if a line only contains new objects or old objects.

~~If a line is marked, but the mark bits of a line are all zero, it must be a line that only contains objects allocated after InitialMark. This doesn't need extra metadata.~~ (Update: This is not true. If a mutator adds an edge from a live old object to the new object, the new object will still be reachable.)
If a line is marked, and the mark bits are a superset of the VO bits, it must be a line that only contains objects allocated after InitialMark. If a block only contains objects before the InitialMark, the mark bits must be a subset of the VO bits. Only all-ones is a superset of all possible VO-bit patterns. If the mark bits and VO bits are identical (every word is an individual object, and they are either all live or all new), it doesn't matter if we copy the mark bits or retain the VO bits. Currently each Line is 256 bytes, corresponding to 32 bits of mark bits or VO bits. It should be easy and efficient to do 32-bit bit operation.
We introduce another one-bit-per-line metadata to record if it contains objects allocated after InitialMark.

tianleq · 2025-07-30T07:22:24Z

"It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects." This is not true. Both lines and every word within that line are marked. So if one blindly copy mark bits to vo bits, then vo bit will be 1 even if that address is not an object

qinsoon · 2025-07-30T07:25:00Z

It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects. This is OK. But what if we need VO bits?

I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated.

tianleq · 2025-07-30T07:28:10Z

It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects. This is OK. But what if we need VO bits?

I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated.

The problem is that we do not want to do the check in the fast path. If we want to mark each individual object, then in the allocation fast-path, we need to check if concurrent marking is active and then set the mark bit. While the current bulk setting way only set the mark bit in the slow-path

wks · 2025-07-30T07:30:21Z

"It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects." This is not true. Both lines and every word within that line are marked. So if one blindly copy mark bits to vo bits, then vo bit will be 1 even if that address is not an object

I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated.

Actually I think for correctness, we must not set mark bits of individual objects when allocating. Suppose there are objects A and B when GC is triggered, and root -> ... -> A -> B. During concurrent marking, a mutator allocated C, and changed the object graph to root -> ... -> A -> C -> B. Then a GC worker visits A for the first time. If C already has the mark bit, GC will not enqueue C, and will not enqueue B, either. It will consider B as dead.

I forgot the SATB barrier. It will remember B when we remove the edge A -> B.

tianleq · 2025-07-30T07:33:04Z

"It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects." This is not true. Both lines and every word within that line are marked. So if one blindly copy mark bits to vo bits, then vo bit will be 1 even if that address is not an object

I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated.

Actually I think for correctness, we must not set mark bits of individual objects when allocating. Suppose there are objects A and B when GC is triggered, and root -> ... -> A -> B. During concurrent marking, a mutator allocated C, and changed the object graph to root -> ... -> A -> C -> B. Then a GC worker visits A for the first time. If C already has the mark bit, GC will not enqueue C, and will not enqueue B, either. It will consider B as dead.

In your case, B will be captured by the SATB barrier, it will not be considered as dead. There is no need to scan those newly allocated objects because any children of those newly allocated objects must have been alive in the snapshot and thus is guaranteed to be traced

qinsoon · 2025-07-30T07:41:26Z

It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects. This is OK. But what if we need VO bits?

I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated.

The problem is that we do not want to do the check in the fast path. If we want to mark each individual object, then in the allocation fast-path, we need to check if concurrent marking is active and then set the mark bit. While the current bulk setting way only set the mark bit in the slow-path

The mark bit could be in the header, and we have to set it per object if it is in the header. We can differentiate header mark bit and side mark bit, and deal with each differently.

But bulk setting mark bits is still a bit hacky -- and this is why we would have issues with VO bits. VO bits copies from mark bits, assuming mark bits is only set for each individual object.

wenyuzhao · 2025-09-11T05:34:37Z

+    }
+
+    fn trace_object(&mut self, object: ObjectReference) -> ObjectReference {
+        let new_object = self


If we add a check to skip young blocks here, the more expensive mark table zeroing in immix allocator can be removed.

I guess you mean mark table bulk-setting (to 1s). If we can identify if the object is in a block that was completely free when concurrent marking started, we can be sure that the object must be newly allocated. But ConcurrentImmix can allocate into partially used blocks (i.e. into holes of lines). Checking blocks will not work here. And the trace_object here may be dispatched to other spaces, too, such as LOS.

wenyuzhao · 2025-09-11T05:37:01Z

+    }
+
+    fn object_probable_write_slow(&mut self, obj: ObjectReference) {
+        crate::plan::tracing::SlotIterator::<VM>::iterate_fields(obj, self.tls.0, |s| {


Need to enqueue obj not it's fields. If I remember this correctly, at the time this code is called, all fields are uninitialized.

Probably I'm wrong, but I guess object_probable_write is not necessary at all. This obj should already be in the root set of the InitialMark pause, and will be marked eventually.

~~I think you are right.~~ The semantics of the MMTk-side API Barrier::object_probable_write says

/// A pre-barrier indicating that some fields of the object will probably be modified soon. /// Specifically, the caller should ensure that: /// * The barrier must called before any field modification. /// * Some fields (unknown at the time of calling this barrier) might be modified soon, without a write barrier. /// * There are no safepoints between the barrier call and the field writes.

If the fields are assigned during concurrent marking, the (new) values will either come from the snapshot at the beginning, or be a new object allocated during concurrent marking. In either case, they will be kept alive. (Update: But the old children of the fields that are overwritten by the assignments will not be kept alive.)

To put it another way, the SATB barrier is a deletion barrier. As long as no objects are disconnected from other objects, there is no need to apply barrier. In the case of OpenJDK, it is assigning objects to fields that are not initialized, so no objects are disconnected.

~~I'll remove object_probable_write for SATBBarrier.~~

Wait. After a second thought, I think if MMTk implements the semantics described in the doc comment of Barrier::object_probable_write, it should still remember its old field values. If another VM calls object_probable_write on an object that already has children, it will still need to remember the old children because they may probably be overwritten. So Barrier::object_probable_write still has to be implemented as it currently is.

The OpenJDK binding can do VM-specific optimization by eliding the invocation of mmtk_object_probable_write in MMTkSATBBarrierSetRuntime::object_probable_write, making use of the knowledge of the SATBBarrier. Currently only the OpenJDK binding uses Barrier::object_probable_write. So it will not be disruptive to other bindings if we change the semantics of Barrier::object_probable_write to make it more specific to OpenJDK's use case and do more optimizations. But we need to change the semantics first.

Please don't remove it or change semantics arbitrarily. I (kind of) depend on it in ART as well. ART has a barrier WriteBarrier::ForEveryFieldWrite whose semantics are essentially object_probable_write. Currently since I only have a generational post-write object remembering barrier implemented, I just call the normal post-write function, but this will be fixed in the future when I add concurrent GC support.

…oncurrent-immix

wks · 2025-09-15T03:32:43Z

I enabled extended testing. I didn't set the binding test repo and branch to the PR for the OpenJDK binding (mmtk/mmtk-openjdk#311) because we will test and merge that separately.

qinsoon · 2025-09-15T03:40:33Z

I enabled extended testing. I didn't set the binding test repo and branch to the PR for the OpenJDK binding (mmtk/mmtk-openjdk#311) because we will test and merge that separately.

Why? We should test with the OpenJDK PR.

wks · 2025-09-15T04:14:09Z

I enabled extended testing. I didn't set the binding test repo and branch to the PR for the OpenJDK binding (mmtk/mmtk-openjdk#311) because we will test and merge that separately.

Why? We should test with the OpenJDK PR.

I am confused. We've always been saying "we'll fix the binding later" or something.

k-sareen · 2025-09-15T04:15:32Z

But how will it work without the correct barrier etc? You need to tell it to use the binding PR so that the barrier works.

EDIT: I think the tests won't even run ConcurrentImmix actually because the CI doesn't have it added yet. So it'll (theoretically) pass but ConcurrentImmix will not have been tested.

qinsoon · 2025-09-15T04:27:23Z

I enabled extended testing. I didn't set the binding test repo and branch to the PR for the OpenJDK binding (mmtk/mmtk-openjdk#311) because we will test and merge that separately.

Why? We should test with the OpenJDK PR.

I am confused. We've always been saying "we'll fix the binding later" or something.

It seems that only when a change is non breaking and a binding may choose to opt in or not, we could use "we'll fix the binding later" approach. In most cases, we don't do that.

When introducing a new plan, we always test with at least one binding to make sure it works (at least for one binding). Otherwise, there is no way for us to tell if the plan works or not.

wks · 2025-09-17T09:52:39Z

binding-refs
OPENJDK_BINDING_REPO=tianleq/mmtk-openjdk
OPENJDK_BINDING_REF=concurrent-immix

qinsoon

LGTM

wks · 2025-09-18T02:44:49Z

Just in case this PR has any side effects on existing plans other than ConcurrentImmix, I'll run some benchmarks to test that.

wks · 2025-09-18T05:30:05Z

lusearch from DaCapo Chopin MR2, mole.moma, 2.4x and 3.0x min heap w.r.t. G1, 20 invocations, 5 iterations, comparing master and this PR.

https://squirrel.anu.edu.au/plotty/wks/noproject/#0|mole-2025-09-18-Thu-025228&build^hfac^invocation^iteration^mmtk_gc&GC^time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&hfac^mmtk_gc&build;build1|40&Histogram%20(with%20CI)^build^mmtk_gc&

GenCopy, GenImmix and StickyImmix become slightly faster in terms of STW time and total time. No obvious difference for Immix. SemiSpace becomes slightly slower in terms of STW time and total time.

But something is strange when I test locally. It seems that GenCopy is using PlanProcessEdges for nursery GCs, too. (It should use GenNurseryProcessEdges.)

wks · 2025-09-18T05:48:00Z

But something is strange when I test locally. It seems that GenCopy is using PlanProcessEdges for nursery GCs, too. (It should use GenNurseryProcessEdges.)

False alarm. I was observing the behavior using the capture.py script with -e 50, i.e. capture every 50th GC. But under the given workload, the GCs are alternating between nursery GC and full-heap GC. Because 50 is an even number, if the starting point is a nursery GC, it will always observe nursery GCs in subsequent observations, and the same is true for full-heap GCs. Coincidentally, I only observed full-heap GC for this PR and thought it cannot trigger nursery GC. The harness_end output shows that about 50% of all GCs are nursery GCs. This is an illusion caused by stroboscopic effect. Perhaps I should document it in the README file of the eBPF tracing tools.

wks · 2025-09-18T05:58:46Z

I cannot reproduce the performance difference locally on my computer. This PR does not change GenCopy or CopySpace. So I can only explain this speed-up for GenXxxxx plans and the slight slow-down of SemiSpace by the nondeterminism caused by profile-guided optimization.

I'll merge this PR.

We added support for the new plan ConcurrentImmix added to the mmtk-core in mmtk/mmtk-core#1355 We implemented the SATB barrier fast paths in the OpenJDK binding, and refactored the barriers to support both pre and post barriers, as well as (weak) reference loading barrier. The OpenJDK binding is now aware of concurrent marking, too. --------- Co-authored-by: Yi Lin <qinsoon@gmail.com> Co-authored-by: Kunshan Wang <wks1986@gmail.com> Co-authored-by: mmtkgc-bot <mmtkgc.bot@gmail.com>

tianleq added 2 commits July 25, 2025 03:07

WIP

fb12863

minor

9ac22bd

tianleq commented Jul 28, 2025

View reviewed changes

Comment thread src/policy/immix/mod.rs Outdated

Merge branch 'master' of github.com:tianleq/mmtk-core into concurrent…

04fee38

…-immix

wks reviewed Jul 28, 2025

View reviewed changes

Comment thread src/plan/barriers.rs Outdated

minor

100c049

k-sareen reviewed Jul 28, 2025

View reviewed changes

Comment thread src/lib.rs Outdated

tianleq commented Jul 28, 2025

View reviewed changes

Comment thread src/policy/space.rs Outdated

tianleq commented Jul 28, 2025

View reviewed changes

Comment thread src/policy/largeobjectspace.rs Outdated

qinsoon reviewed Jul 29, 2025

View reviewed changes

tianleq commented Jul 29, 2025

View reviewed changes

Comment thread src/scheduler/work_bucket.rs Outdated

wks reviewed Jul 29, 2025

View reviewed changes

Comment thread src/scheduler/work_bucket.rs Outdated

Comment thread src/scheduler/scheduler.rs Outdated

Comment thread src/scheduler/work_bucket.rs Outdated

qinsoon reviewed Jul 30, 2025

View reviewed changes

Comment thread src/plan/barriers.rs

qinsoon reviewed Jul 30, 2025

View reviewed changes

Comment thread src/scheduler/scheduler.rs Outdated

qinsoon reviewed Jul 30, 2025

View reviewed changes

Comment thread src/scheduler/work_bucket.rs Outdated

wks mentioned this pull request Jul 30, 2025

Cancellable (or upgradeable) concurrent GC #1357

Open

qinsoon added 2 commits July 31, 2025 02:17

Add ref/finalizer packets for final pause. Use log instead of println.

2bbb200

schedule_concurrent_packets before resuming mutators

90f4518

wenyuzhao reviewed Sep 11, 2025

View reviewed changes

qinsoon and others added 10 commits September 11, 2025 05:59

Remove duplicate clear/set_side_log_bits from policies

ad0d88e

Set and clear side unlog bits in parallel.

be33efe

Merge remote-tracking branch 'tianleq/concurrent-immix' into review/c…

34be825

…oncurrent-immix

Remove visualization of unused work packet

2ee86af

Remove unnecessary unlog bit clearing

0c3d9f6

Move eager_mark_lines to Line.

f76e954

Just use bset_metadata

b7b79e4

Code style and comments

26f74c1

Extract bulk_set_line_mark_states

5b5552b

Make the -e option of capture.py work for ConcurrentImmix

815cc17

wks added the PR-extended-testing Run extended tests for the pull request label Sep 15, 2025

wks mentioned this pull request Sep 17, 2025

Concurrent Immix mmtk/mmtk-openjdk#311

Merged

Merge branch 'master' into review/concurrent-immix

446ec36

qinsoon approved these changes Sep 18, 2025

View reviewed changes

qinsoon mentioned this pull request Sep 18, 2025

Support allocate_as_live in all policies #1391

Open

wks added this pull request to the merge queue Sep 18, 2025

Merged via the queue into mmtk:master with commit a4dd70c Sep 18, 2025
37 of 39 checks passed

Conversation

tianleq commented Jul 28, 2025 • edited by wks Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wks commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-sareen commented Jul 29, 2025

Uh oh!

wks commented Jul 29, 2025

Uh oh!

k-sareen commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wks commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleq commented Jul 30, 2025

Uh oh!

qinsoon commented Jul 30, 2025

Uh oh!

tianleq commented Jul 30, 2025

Uh oh!

wks commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianleq commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinsoon commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wenyuzhao Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

wks Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

wenyuzhao Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

wenyuzhao Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

tianleq commented Jul 28, 2025 •

edited by wks

Loading

wks commented Jul 29, 2025 •

edited

Loading

wks commented Jul 30, 2025 •

edited

Loading

wks commented Jul 30, 2025 •

edited

Loading

tianleq commented Jul 30, 2025 •

edited

Loading

wks Sep 12, 2025 •

edited

Loading

k-sareen Sep 13, 2025 •

edited

Loading

wks commented Sep 15, 2025 •

edited

Loading

k-sareen commented Sep 15, 2025 •

edited

Loading