Concurrent Immix#1355
Conversation
|
What will happen if a mutator wants to We may temporarily tell the VM binding that MMTk doesn't currently support forking when using concurrent GC. We may fix it later. One simple solution is postponing the forking until the current GC finishes. The status quo is that only CRuby and Android need forking. But CRuby will not support concurrent GC in a short term. |
You can't let this happen. Either the binding needs to ensure that the mutator waits while concurrent marking is active, or you don't let a concurrent GC happen before forking (ART's method of dealing with this). |
Agreed. Fortunately, the current API doc for /// This function sends an asynchronous message to GC threads and returns immediately, but it
/// is only safe for the VM to call `fork()` after the underlying **native threads** of the GC
/// threads have exited. After calling this function, the VM should wait for their underlying
/// native threads to exit in VM-specific manner before calling `fork()`.
So a well-behaving VM binding shall wait for all the GC worker threads (which are created by the binding via |
|
Mentioning it before I forget: we need to change the name for the |
wks
left a comment
There was a problem hiding this comment.
The current "WorkerGoal" mechanism should be able to handle the case where mutators trigger another GC between InitialMark and FinalMark. We can remove WorkPacketStage::Initial and WorkPacketStage::ConcurrentSentinel, and use GCWorkScheduler::on_last_parked to transition the GC state from InitialMark to the concurrent marking to FinalMark and finally finish the GC. See inline comments for more details.
|
It looks like we keep newly allocated objects alive by marking the lines, To properly support VO bits, we need to do two things:
There are multiple ways to know if a line only contains new objects or old objects.
|
|
"It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects." This is not true. Both lines and every word within that line are marked. So if one blindly copy mark bits to vo bits, then vo bit will be 1 even if that address is not an object |
I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated. |
The problem is that we do not want to do the check in the fast path. If we want to mark each individual object, then in the allocation fast-path, we need to check if concurrent marking is active and then set the mark bit. While the current bulk setting way only set the mark bit in the slow-path |
I forgot the SATB barrier. It will remember B when we remove the edge |
In your case, B will be captured by the SATB barrier, it will not be considered as dead. There is no need to scan those newly allocated objects because any children of those newly allocated objects must have been alive in the snapshot and thus is guaranteed to be traced |
The mark bit could be in the header, and we have to set it per object if it is in the header. We can differentiate header mark bit and side mark bit, and deal with each differently. But bulk setting mark bits is still a bit hacky -- and this is why we would have issues with VO bits. VO bits copies from mark bits, assuming mark bits is only set for each individual object. |
| } | ||
|
|
||
| fn trace_object(&mut self, object: ObjectReference) -> ObjectReference { | ||
| let new_object = self |
There was a problem hiding this comment.
If we add a check to skip young blocks here, the more expensive mark table zeroing in immix allocator can be removed.
There was a problem hiding this comment.
I guess you mean mark table bulk-setting (to 1s). If we can identify if the object is in a block that was completely free when concurrent marking started, we can be sure that the object must be newly allocated. But ConcurrentImmix can allocate into partially used blocks (i.e. into holes of lines). Checking blocks will not work here. And the trace_object here may be dispatched to other spaces, too, such as LOS.
| } | ||
|
|
||
| fn object_probable_write_slow(&mut self, obj: ObjectReference) { | ||
| crate::plan::tracing::SlotIterator::<VM>::iterate_fields(obj, self.tls.0, |s| { |
There was a problem hiding this comment.
Need to enqueue obj not it's fields. If I remember this correctly, at the time this code is called, all fields are uninitialized.
There was a problem hiding this comment.
Probably I'm wrong, but I guess object_probable_write is not necessary at all. This obj should already be in the root set of the InitialMark pause, and will be marked eventually.
There was a problem hiding this comment.
I think you are right. The semantics of the MMTk-side API Barrier::object_probable_write says
/// A pre-barrier indicating that some fields of the object will probably be modified soon.
/// Specifically, the caller should ensure that:
/// * The barrier must called before any field modification.
/// * Some fields (unknown at the time of calling this barrier) might be modified soon, without a write barrier.
/// * There are no safepoints between the barrier call and the field writes.
If the fields are assigned during concurrent marking, the (new) values will either come from the snapshot at the beginning, or be a new object allocated during concurrent marking. In either case, they will be kept alive. (Update: But the old children of the fields that are overwritten by the assignments will not be kept alive.)
To put it another way, the SATB barrier is a deletion barrier. As long as no objects are disconnected from other objects, there is no need to apply barrier. In the case of OpenJDK, it is assigning objects to fields that are not initialized, so no objects are disconnected.
I'll remove object_probable_write for SATBBarrier.
There was a problem hiding this comment.
Wait. After a second thought, I think if MMTk implements the semantics described in the doc comment of Barrier::object_probable_write, it should still remember its old field values. If another VM calls object_probable_write on an object that already has children, it will still need to remember the old children because they may probably be overwritten. So Barrier::object_probable_write still has to be implemented as it currently is.
The OpenJDK binding can do VM-specific optimization by eliding the invocation of mmtk_object_probable_write in MMTkSATBBarrierSetRuntime::object_probable_write, making use of the knowledge of the SATBBarrier. Currently only the OpenJDK binding uses Barrier::object_probable_write. So it will not be disruptive to other bindings if we change the semantics of Barrier::object_probable_write to make it more specific to OpenJDK's use case and do more optimizations. But we need to change the semantics first.
There was a problem hiding this comment.
Please don't remove it or change semantics arbitrarily. I (kind of) depend on it in ART as well. ART has a barrier WriteBarrier::ForEveryFieldWrite whose semantics are essentially object_probable_write. Currently since I only have a generational post-write object remembering barrier implemented, I just call the normal post-write function, but this will be fixed in the future when I add concurrent GC support.
|
I enabled extended testing. I didn't set the binding test repo and branch to the PR for the OpenJDK binding (mmtk/mmtk-openjdk#311) because we will test and merge that separately. |
Why? We should test with the OpenJDK PR. |
I am confused. We've always been saying "we'll fix the binding later" or something. |
|
But how will it work without the correct barrier etc? You need to tell it to use the binding PR so that the barrier works. EDIT: I think the tests won't even run |
It seems that only when a change is non breaking and a binding may choose to opt in or not, we could use "we'll fix the binding later" approach. In most cases, we don't do that. When introducing a new plan, we always test with at least one binding to make sure it works (at least for one binding). Otherwise, there is no way for us to tell if the plan works or not. |
|
binding-refs |
|
Just in case this PR has any side effects on existing plans other than ConcurrentImmix, I'll run some benchmarks to test that. |
|
lusearch from DaCapo Chopin MR2, mole.moma, 2.4x and 3.0x min heap w.r.t. G1, 20 invocations, 5 iterations, comparing master and this PR. GenCopy, GenImmix and StickyImmix become slightly faster in terms of STW time and total time. No obvious difference for Immix. SemiSpace becomes slightly slower in terms of STW time and total time. But something is strange when I test locally. It seems that GenCopy is using |
False alarm. I was observing the behavior using the |
|
I cannot reproduce the performance difference locally on my computer. This PR does not change GenCopy or CopySpace. So I can only explain this speed-up for GenXxxxx plans and the slight slow-down of SemiSpace by the nondeterminism caused by profile-guided optimization. I'll merge this PR. |
We added support for the new plan ConcurrentImmix added to the mmtk-core in mmtk/mmtk-core#1355 We implemented the SATB barrier fast paths in the OpenJDK binding, and refactored the barriers to support both pre and post barriers, as well as (weak) reference loading barrier. The OpenJDK binding is now aware of concurrent marking, too. --------- Co-authored-by: Yi Lin <qinsoon@gmail.com> Co-authored-by: Kunshan Wang <wks1986@gmail.com> Co-authored-by: mmtkgc-bot <mmtkgc.bot@gmail.com>
We add a concurrent Immix plan. It can do concurrent marking (non-moving), and falls back to stop-the-world Immix collection with opportunistic defragmentation.
We add a snap-at-the-beginning barrier to support concurrent Immix.
Now the plans control when to clear side unlog bits. This allows the same policy (specifically the
ImmixSpace) to be used by different plans and clear the unlog bits at different time.