You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is important for merging the lxr branch into master, but can be addressed separately before merging lxr.
An object graph consists of GC objects as nodes and edges between nodes or from roots to nodes. But in some VMs, there may be nodes that are not objects. In #1137, we mentioned malloc buffers in CRuby. Malloc-allocated buffers are attached (owned) by GC objects. They may contain references and therefore must be scanned, too. In the current mmtk-ruby binding, such malloc buffers are reclaimed using finalizers when their owner (GC) objects die.
But there are also native things that are not uniquely owned by GC objects, which complicates the matter.
"Claiming" in OpenJDK
In OpenJDK, there are off-heap native objects, such as ClassLoaderData. Those objects are not managed by GC, and are not uniquely owned by any GC objects. For ClassLoaderData, the pointer path is object -> klass -> class_loader_data. Each ClassLoaderData instance has a _claimed field so that during GC (or other activities), when multiple threads reach a ClassLoaderData instance via different paths, the thread that atomically "claims" the instance will scan it. The _claimed field is similar to the mark bit of GC objects, and is used to prevent re-scanning the same instance again and again.
In addition to ClassLoaderData, there are other types that uses similar "claiming" patterns. For example, the nmethod::_oops_do_mark_link is linked when "claimed".
The problem
From mmtk-core's point of view, those instances are off-heap. Currently, the lxr branch will claim and scan the ClassLoaderData instance associated to the object->klass->class_loader_data when scanning the object. From mmtk-core's point of view, it will appear that the object has many outgoing edges, including those from the ClassLoaderData. This behavior is controlled by a property of the SlotVisitor in Scanning::scan_object(object, slot_visitor). It is confusing from the API design's point of view because
The ClassLoaderData is Java-specific, and cannot be extended to other bindings.
The outgoing edges from the ClassLoaderData are not really part of the object to be scanned, and those edges can be reachable from other objects, too.
"Claiming" a ClassLoaderData instance has side effect. It is only valid in a certain context (such as transitive closure, before which we do initializations such as clearing the _claimed fields of all instances), and it is inappropriate for a general Scanning::scan_object API function which is supposed to have no more side effects than returning the Slot instances for updating, and also inappropriate for Scanning::scan_object_and_trace_edges which is only supposed to have the side effect of update outgoing slots.
Related work
In #1137, we mentioned creating special work packets for off-heap non-object data structures. This thought may be applicable in this case.
The pull request #1437 adds a parameter RefScanPolicy to Scanning::scan_object{,_and_trace_edges} to control the behavior of object scanning, and also explicitly allowing the side effect of "discovering references" in VM-specific ways. In the lxr branch, claiming and scanning of ClassLoaderData are also controlled by similar flags. This makes us wonder should the parameter be more general. For example, should we expose the intention that "this object scanning operation is done for transitive closure" to the VM binding? More precisely, with such a parameter, it is no longer mere "scanning object", but actually "processing node" which the old TransitiveClosure type was supposed to implement (but did not) (See this blog). Then should we have something like "process node" in our API? I hesitate to do so because that exposes too much of the MMTk internals to the VM binding, and blurs the boundary of the MMTk-Binding API.
The bottom line
The bottom line is, the API should be VM-neutral, but at least powerful enough to allow OpenJDK to implement class unloading.
lxrbranch intomaster, but can be addressed separately before merginglxr.An object graph consists of GC objects as nodes and edges between nodes or from roots to nodes. But in some VMs, there may be nodes that are not objects. In #1137, we mentioned malloc buffers in CRuby. Malloc-allocated buffers are attached (owned) by GC objects. They may contain references and therefore must be scanned, too. In the current mmtk-ruby binding, such malloc buffers are reclaimed using finalizers when their owner (GC) objects die.
But there are also native things that are not uniquely owned by GC objects, which complicates the matter.
"Claiming" in OpenJDK
In OpenJDK, there are off-heap native objects, such as
ClassLoaderData. Those objects are not managed by GC, and are not uniquely owned by any GC objects. ForClassLoaderData, the pointer path isobject -> klass -> class_loader_data. EachClassLoaderDatainstance has a_claimedfield so that during GC (or other activities), when multiple threads reach aClassLoaderDatainstance via different paths, the thread that atomically "claims" the instance will scan it. The_claimedfield is similar to the mark bit of GC objects, and is used to prevent re-scanning the same instance again and again.In addition to
ClassLoaderData, there are other types that uses similar "claiming" patterns. For example, thenmethod::_oops_do_mark_linkis linked when "claimed".The problem
From mmtk-core's point of view, those instances are off-heap. Currently, the
lxrbranch will claim and scan theClassLoaderDatainstance associated to theobject->klass->class_loader_datawhen scanning theobject. From mmtk-core's point of view, it will appear that theobjecthas many outgoing edges, including those from theClassLoaderData. This behavior is controlled by a property of theSlotVisitorinScanning::scan_object(object, slot_visitor). It is confusing from the API design's point of view becauseClassLoaderDatais Java-specific, and cannot be extended to other bindings.ClassLoaderDataare not really part of theobjectto be scanned, and those edges can be reachable from other objects, too.ClassLoaderDatainstance has side effect. It is only valid in a certain context (such as transitive closure, before which we do initializations such as clearing the_claimedfields of all instances), and it is inappropriate for a generalScanning::scan_objectAPI function which is supposed to have no more side effects than returning theSlotinstances for updating, and also inappropriate forScanning::scan_object_and_trace_edgeswhich is only supposed to have the side effect of update outgoing slots.Related work
In #1137, we mentioned creating special work packets for off-heap non-object data structures. This thought may be applicable in this case.
The pull request #1437 adds a parameter
RefScanPolicytoScanning::scan_object{,_and_trace_edges}to control the behavior of object scanning, and also explicitly allowing the side effect of "discovering references" in VM-specific ways. In thelxrbranch, claiming and scanning ofClassLoaderDataare also controlled by similar flags. This makes us wonder should the parameter be more general. For example, should we expose the intention that "this object scanning operation is done for transitive closure" to the VM binding? More precisely, with such a parameter, it is no longer mere "scanning object", but actually "processing node" which the oldTransitiveClosuretype was supposed to implement (but did not) (See this blog). Then should we have something like "process node" in our API? I hesitate to do so because that exposes too much of the MMTk internals to the VM binding, and blurs the boundary of the MMTk-Binding API.The bottom line
The bottom line is, the API should be VM-neutral, but at least powerful enough to allow OpenJDK to implement class unloading.