Non-object nodes in VMs

-   This issue is partially addressed in https://github.com/mmtk/mmtk-core/issues/1137
-   This issue is important for merging the `lxr` branch into `master`, but can be addressed separately before merging `lxr`.

An object graph consists of GC objects as nodes and edges between nodes or from roots to nodes.  But in some VMs, there may be nodes that are not objects.  In https://github.com/mmtk/mmtk-core/issues/1137, we mentioned malloc buffers in CRuby.  Malloc-allocated buffers are attached (owned) by GC objects.  They may contain references and therefore must be scanned, too.  In the current mmtk-ruby binding, such malloc buffers are reclaimed using finalizers when their owner (GC) objects die.

But there are also native things that are not uniquely owned by GC objects, which complicates the matter.

# "Claiming" in OpenJDK

In OpenJDK, there are off-heap native objects, such as `ClassLoaderData`.  Those objects are not managed by GC, and are *not* uniquely owned by any GC objects.  For `ClassLoaderData`, the pointer path is `object -> klass -> class_loader_data`.  Each `ClassLoaderData` instance has a `_claimed` field so that during GC (or other activities), when multiple threads reach a `ClassLoaderData` instance via different paths, the thread that atomically "claims" the instance will scan it.  The `_claimed` field is similar to the mark bit of GC objects, and is used to prevent re-scanning the same instance again and again.

In addition to `ClassLoaderData`, there are other types that uses similar "claiming" patterns.  For example, the `nmethod::_oops_do_mark_link` is linked when "claimed".

# The problem

From mmtk-core's point of view, those instances are off-heap.  Currently, the `lxr` branch will claim and scan the `ClassLoaderData` instance associated to the `object->klass->class_loader_data` when scanning the `object`.  From mmtk-core's point of view, it will appear that the `object` has many outgoing edges, including those from the `ClassLoaderData`.  This behavior is controlled by a property of the `SlotVisitor` in `Scanning::scan_object(object, slot_visitor)`.  **It is confusing from the API design's point of view** because

1.  The `ClassLoaderData` is Java-specific, and cannot be extended to other bindings.
2.  The outgoing edges from the `ClassLoaderData` are not really part of the `object` to be scanned, and those edges can be reachable from other objects, too.
3.  "Claiming" a `ClassLoaderData` instance has side effect.  It is only valid in a certain context (such as transitive closure, before which we do initializations such as clearing the `_claimed` fields of all instances), and it is inappropriate for a general `Scanning::scan_object` API function which is supposed to have no more side effects than returning the `Slot` instances for updating, and also inappropriate for `Scanning::scan_object_and_trace_edges` which is only supposed to have the side effect of update outgoing slots.

# Related work

In https://github.com/mmtk/mmtk-core/issues/1137, we mentioned creating special work packets for off-heap non-object data structures.  This thought may be applicable in this case.

The pull request https://github.com/mmtk/mmtk-core/pull/1437 adds a parameter `RefScanPolicy` to `Scanning::scan_object{,_and_trace_edges}` to control the behavior of object scanning, and also explicitly allowing the side effect of "discovering references" in VM-specific ways.  In the `lxr` branch, claiming and scanning of `ClassLoaderData` are also controlled by similar flags.  This makes us wonder should the parameter be more general.  For example, should we expose **the intention that "this object scanning operation is done for transitive closure"** to the VM binding?  More precisely, with such a parameter, it is no longer mere "scanning object", but actually "processing node" which the old `TransitiveClosure` type was supposed to implement (but did not) (See [this blog](https://wks.github.io/blog/2022/05/16/fifteen-years-transitiveclosure.html)).  Then should we have something like "process node" in our API?  I hesitate to do so because that exposes too much of the MMTk internals to the VM binding, and blurs the boundary of the MMTk-Binding API.

# The bottom line

The bottom line is, the API should be VM-neutral, but at least powerful enough to allow OpenJDK to implement class unloading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-object nodes in VMs #1448

"Claiming" in OpenJDK

The problem

Related work

The bottom line

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-object nodes in VMs #1448

Description

"Claiming" in OpenJDK

The problem

Related work

The bottom line

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions