Skip to content

WIP: Replace ProcessEdgesWork with Trace#1468

Draft
wks wants to merge 34 commits intommtk:masterfrom
wks:fix/split-process-edges-work2
Draft

WIP: Replace ProcessEdgesWork with Trace#1468
wks wants to merge 34 commits intommtk:masterfrom
wks:fix/split-process-edges-work2

Conversation

@wks
Copy link
Copy Markdown
Collaborator

@wks wks commented Mar 26, 2026

TODO:

  • Try to remove ObjectsClosure in favor for lambda expression |enqueued_object| { ... }.
  • Merge ScanObjects and PlanScanObjects. Remove the ScanObjectsWork trait.
  • Let TracePolicy customize the work packets for processing root slots and root nodes.
    • Also remove the TracePolicy::is_concurrent() method because it is ambiguous. We should let ConcurrentMarkingTracePolicy select a different work packet than the default ProcessRootNodes.
  • Change the name of some types/functions to make them more consistent.
    • Consider the name Trace instead of TracePolicy.
  • Test disabled-by-default features
  • Cross check with Zixian's branch when most refactoring is done.
  • Fix tests and mock tests
  • Fix eBPF tracing tools.
  • Fix comments, documentation and error/log messages
  • Fix tutorial.
  • Performance evaluation. There should be no performance difference.

Currently, many parts of mmtk-core rely on the ProcessEdgesWork trait merely for the capability of tracing objects, while the ProcessEdgesWork was historically designed mainly for processing slots. This has been one major problem that makes the mmtk-core code base hard to understand and hard to maintain. We refactor the code to eliminate such unnecessary dependency.

Adding the Trace trait

We introduce the Trace trait.

Each implementation provides essential functionalities of a trace, including:

  • trace_object: This is the most important function.
  • post_scan_object: Used by certain spaces such as ImmixSpace
  • may_move_objects: A hint for eliding store-back when tracing a slot.

After this change, there will be the following implementations of Trace. Different implementations mostly differ in the way how they dispatch trace_object, but some also customize other methods:

  • SFTTrace: The trace policy that dispatches trace_object calls via the SFT.
  • PlanTrace: The trace policy that dispatches trace_object via the PlanTraceObject::trace_object method of a plan.
  • UnsupportedTrace: A dummy implementation for unsupported cases, such as "tracing pinning roots" in plans that don't support pinning.
  • GenNurseryTrace: The trace policy that dispatches trace_object to GenerationalPlanExt::trace_object_nursery.
  • ConcurrentMarkingTrace: The trace policy for concurrent marking. It dispatches trace_policy like PlanTrace, but creates different work packets to support concurrent marking.

Some trace policies, such as PlanTrace, GenNurseryTrace and ConcurrentMarkingTrace, uses the const KIND: TraceKind generic parameter to support different behaviors in different traces.

GCWorkContext now lets traces select Trace instead of ProcessEdgesWork to customize the behavior of default and pinning traces.

Removing the ProcessEdgesWork trait

We removed the ProcessEdgesWork trait (fixes #599) and the ProcessEdgesBase struct (fixes #1040).

The SFTProcessEdges, PlanProcessEdges and GenNurseryProcessEdges work packets previously processed slots in different ways. They are replaced with one unified TracingProcessSlots<T: Trace> work packet that encapsulates an instance of policy: T and calls policy.trace_object to trace objects.

Similarly, we unified ScanObjects and PlanScanObjects to TracingProcessNodes. Also note that the ScanObjectsWork work packets not only scan objects, but also do node-enqueue tracing. We name it TracingProcessNodes to be consistent.

We still keep trait ObjectTracer and trait ObjectTracerContext as part of the VM binding API, allowing VMs to implement Scanning::process_weak_refs without changes. But internally, the old ProcessEdgesWorkTracerContext is replaced with TracingTracerContext<T: Trace>, and ProcessEdgesWorkTracer is replaced with TracingObjectTracer<T: Trace>. They implements the mechanism of expanding the transitive closure by tracing.

Existing work packets that depended on E: ProcessEdgesWork for the sole purpose of expanding the transitive closure (which includes most types and methods related to weak reference and finalization processing) are now parameterized with <T: Trace>. Such work packets can use T to instantiate TracingTracerContext<T: Trace>.

The Finalizable::keep_alive method used to have a type parameter <E: ProcessEdgesWork>. It is now parameterized with <OT: ObjectTracer>.

Fixing ConcurrentImmix work packets

The concurrent_marking_work::ProcessRootSlots work packet is removed in favor for the ConcurrentMarkingRootsWorkFactory. By allowing GCWorkContext to select a RootsWorkFactory implementation, concurrent marking can create ConcurrentTraceObjects work packets directly from roots (including both root slots and root nodes) and put the work packet in the Concurrent bucket. This fixes the issued mentioned in #1455 and makes the worked around in #1454 unnecessary.

By not relying on the ProcessEdgesWork trait, we are able to bypass the process_slots, process_slot and trace_object methods in ProcessEdgesWork which are not designed for node-enqueuing tracing, and implement a traditional depth-first searching algorithm using a simple loop and a deque. This greatly simplified the implementation of the ConcurrentTraceObjects work packet.

Rewriting the sanity checker

With the Trace trait extracted, it is trivial to rewrite the tracing loop in a two-level tracing/scanning loop using Trace::trace_object. However, since every invocation of trace_object requires a mutex lock on the SanityChecker itself, and the sanity checker does not depend on any plan or space, we completely removed its dependency on Trace or any TracingProcess{Slots,Nodes,PinningRoots} work packets. Instead, we implemented the whole sanity tracing loop in a single SanityClosure work packet.

The sanity checker now collects roots in the RootsWorkFactory implementations instead of TracingProcess{Slots,Nodes,PinningRoots} work packets.

The issue #1469 is no longer reproducible after this change.

Minor changes

Renaming work packets

Some work packets are renamed to reflect the fact that they are all related to tracing GC and more accurately reflect their true natures.

  • ProcessEdgesWork and {SFT,Plan}ProcessSlots -> TracingProcessSlots
  • ScanObjectsWork and {,Plan}ScanObjects -> TracingProcessNodes
    • It not only scans objects, but traces edges for "node-enqueuing tracing", too.
  • ProcessRootNodes -> TracingProcessPinningRoots
    • It is primarily designed for pinning roots, so let's emphasize that.
  • ProcessEdgesWorkObjectTracerContext -> TracingTracerContext
  • ProcessEdgesWorkObjectTracer -> TracingObjectTracer

Closure implements ObjectTracer

FnMut(ObjectReference) now implements ObjectTracer. This allows a closure to be used in trace_object:

policy.trace_object(object, ..., &mut |enqueued_object| {
    // Scan the `enqueued_object` now, or enqueue the `enqueued_object`.
});

OptionObjectQueue and SFT

Option<ObjectReference> now implements a one-element ObjectQueue, and it is aliased as OptionObjectQueue. This allows decoupling SFT::sft_trace_object from the concrete VectorObjectQueue type by saving the enqueued object to a temporary variable of Option<ObjectReference> type before forwarding it to a generic <Q: ObjectQueue> instance.

Extracting scanning_helper

We introduced the scanning_helper module. It contains functions that scan objects using scan_object or scan_object_and_trace_edges depending on the VM binding's preference (VM::VMScanning::support_slot_enqueuing). It is intended to help writing simple textbook-style tracing loops that scan objects and trace objects all in one function.

Lifetime in ObjectTracerContext

The method ObjectTracerContext::with_tracer now takes a lifetime parameter<'w> which is the lifetime during which the &'w mut GCWorker reference can be used. However, this doesn't break existing VM binding code. This fixes the last work item in #921

Things not done in this PR

As we mentioned above, we revealed that some types are closely related to tracing GC, and in fact they take up half of the source code lines in src/scheduler/gc_work.rs. We should move them to the Rust module for tracing GC, i.e. mmtk::plan::tracing or its sub-module. However, to make this PR easy to review, we let the code stay at the locations of their older counterparts (i.e. TracingProcessNodes near the old ScanObjectsWork). We shall make the change in a future PR and fix #1279

Extracting the Trace trait allows the DefaultProcessSlots::gc_work to be implemented using a very succinct loop. It is attempting to replace the double enqueuing strategy (Vec<Slot> -> Vec<ObjectReference> -> Vec<Slot>) with a traditional loop that visits slots and scans objects together. But we are not doing it now. By not making significant algorithmic changes to production plans, we expect this PR to have similar performance as before. We can leave that refactoring to the future.

@caizixian also proposed a change to make TraceKind a trait instead of an alias of u8. We do not apply that change in this PR. Note that there is an alternative way to refactor TraceKind. Instead of making TraceKind a trait, we remove TraceKind, but make one Trace implementations for each existing TraceKind. We need to discuss this further to decide which refactoring route to take.

Acknowledgement and Related work

@caizixian independently made an attempt to do similar refactorings. It uses the trait name TracePolicy.

Fixes: #599
Fixes: #921
Fixes: #1040
Fixes: #1455
Fixes: #1469

@qinsoon
Copy link
Copy Markdown
Member

qinsoon commented Mar 27, 2026

The refactoring looks clean. The following are some observations after quickly reading through the changes. Some are just observations and no action is needed for this PR.

  • I find the name TracePolicy confusing, given that we already have PolicyTraceObject.
  • We should be able to merge GenNurseryTracePolicy and ConcurrentRootTracePolicy into PlanTracePolicy. PlanTraceObject already support TraceKind, we should be able to do nursery trace and concurrent trace with PlanTraceObject. The fact that we are not doing it is that the procedural macro is not expressive enough to describe nursery trace or concurrent trace. But we plan to remove the procedural macro anyway (Procedural macros make it hard to understand the core tracing #1356), and replace with hand written dispatch. So we should be able to merge GenNurseryTracePolicy and ConcurrentRootTracePolicy with PlanTracePolicy. I don't propose to merge those types in this PR. I just mean that the existence of nursery trace and concurrent trace as separate types might be just accidental.
  • SFTTracePolicy dispatches trace_object based on the space, and does not have the information about TraceKind. So it only works for spaces It seems less and less useful for complex plans with multiple trace kinds. We could consider removing it at some point.

}

fn is_concurrent() -> bool {
true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is also used in STW, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. src/plan/concurrent/immix/gc_work.rs defines two GCWorkContext implementations.

  • ConcurrentImmixSTWGCWorkContext which uses PlanTraceObject
  • ConcurrentImmixGCWorkContext<T: TracePolicy> which uses an externally supplied TracePolicy implementation.

ConcurrentImmix::schedule_concurrent_marking_{initial,final}_pause create instances of StopMutators::<ConcurrentImmixGCWorkContext<ConcurrentRootTracePolicy<VM, Self, TRACE_KIND_FAST>>>. That selects ConcurrentRootTracePolicy.

This means ConcurrentRootTracePolicy is only used in concurrent GC, but not STW GC.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kunshan and I talked about this. What I meant is that this type is used in both the actual concurrent marking, and the concurrent pauses.

The only use of this method right now is when we decide whether to push work with notifying workers, or not notifying workers. If we push work to the current pause, we should notify workers so they can pick up the work quick. If we push work to the concurrent stage, we don't need to notify workers. But I don't feel we can tell the difference from this method.

Comment thread src/scheduler/gc_work.rs Outdated
let work =
O2OTP::from_mmtk(mmtk).create_scan_work(root_objects_to_scan, mmtk, self.bucket);
if O2OTP::is_concurrent() {
worker.scheduler().work_buckets[self.bucket].add_no_notify(work);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find where this add_no_notify was from. Did the old code just use the normal add?

Comment thread src/scheduler/gc_work.rs
for slot in self.slots.iter() {
if let Some(object) = slot.load() {
let new_object = self.policy.trace_object(worker, object, &mut queue);
if T::may_move_objects() && new_object != object {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the compiler can inline may_move_objects. It was a constant OVERWRITE_REFERENCE.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change it to a constant just to be safe.

@wks
Copy link
Copy Markdown
Collaborator Author

wks commented Mar 27, 2026

Alternative name of TracePolicy

The name TracePolicy is not perfect. MMTk already has the concept of "policy". Having another concept "trace policy" that is different from the old "policy" is ambiguous.

Trace is one good candidate. The JikesRVM already has the concept of Trace and TraceLocal. Every plan implements its TraceLocal, e.g. GenNurseryTraceLocal and ImmixDefragTraceLocal. TraceLocal customizes the trace_object method, but it is stateful because it also contains a marking queue. Our TracePolicy is stateless, so it is more similar to the global Trace object of the old JikesRVM.

Should TracePolicy implementations be based on dispatching mechanism?

Currently we have different TracePolicy implementations based on dispatching mechanisms, such as SFTTracePolicy (via SFT), PlanTracePolicy (via PlanTraceObject::trace_object), GenNurseryTracePolicy (via GenerationalPlanExt::trace_object_nursery). But ConcurrentRootTracePlan is different. It dispatches via PlanTraceObject::trace_object, too.

An alternative design is differentiating TracePolicy by the kind of traces. For example, we shall have

  • ImmixFastTracePolicy and ImmixDefragTracePolicy
  • GenNurseryTracePolicy and GenMatureTracePolicy
  • MarkTracePolicy and CompactTracePolicy

By doing this, we can abolish the TraceKind and completely depend on the TracePolicy to decide how to do trace_object.

Currently, each plan uses aliases and TraceKind to select TracePolicy implementations. For example, in src/plan/markcompact/gc_work.rs, we define

/// Marking trace
pub type MarkingTracePolicy<VM> = PlanTracePolicy<MarkCompact<VM>, TRACE_KIND_MARK>;
/// Forwarding trace
pub type ForwardingTracePolicy<VM> = PlanTracePolicy<MarkCompact<VM>, TRACE_KIND_FORWARD>;

wks added 8 commits March 27, 2026 15:52
And remove TracePolicy::is_concurrent
Now the DefaultProcessSlots and DefaultScanObjects work packets create
each other with cloned TracePolicy.  ConcurrentImmix uses a completely
different RootsWorkFactory to spawn all root-processing work packets.
@wks wks changed the title WIP: Replace ProcessEdgesWork with TracePolicy WIP: Replace ProcessEdgesWork with Trace Mar 31, 2026
Comment thread src/plan/global.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants