-
Notifications
You must be signed in to change notification settings - Fork 78
Concurrent immix #1355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Concurrent immix #1355
Conversation
src/policy/largeobjectspace.rs
Outdated
pub fn prepare(&mut self, full_heap: bool) { | ||
if full_heap { | ||
debug_assert!(self.treadmill.is_from_space_empty()); | ||
// debug_assert!(self.treadmill.is_from_space_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to comment out this assertion as a full stop-the-world GC might be triggered during concurrent marking, this assertion will be triggered in this case.
src/vm/collection.rs
Outdated
@@ -162,4 +162,7 @@ pub trait Collection<VM: VMBinding> { | |||
fn create_gc_trigger() -> Box<dyn GCTriggerPolicy<VM>> { | |||
unimplemented!() | |||
} | |||
|
|||
/// Inform the VM of concurrent marking status | |||
fn set_concurrent_marking_state(_active: bool) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binding uses this to know if SATB is active and decides whether to call 'soft reference load barrier'. This flag can be stored as a global flag for SATB barriers, and the binding can access the barrier. In that case, we may not need this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that barrier is an opaque pointer from the binding's perspective. If we want to store this flag in the barrier and let binding access it, then we need to make sure all rust barrier struct has the same size, and expose that to the binding. I do not think it is worthwhile to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it an opaque pointer? It is a dyn Barrier
fat pointer, and allows Downcast
. We can always access the barrier from the binding-side Rust code and store an active
flag in the same static for the C++ to use. E.g. at the end of a pause, check if SATB is active. If so, set the static variable and enable the load reference barrier.
But anyway, the proposal above is based on an assumption that we do not want the function fn set_concurrent_marking_state(_active: bool)
. If it is okay to keep the function, then we don't need the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call seems necessary. I tried to add an active
flag for barriers in MMTk core, and set the flag in the mutator prepare of initial marking, and clear the flag in mutator release of final marking. In the binding side, I get the flag from MMTk core, and set the value at the binding side before resuming mutators. I assumed that we can set/clear the flag at any time during a pause, and the binding only needs to see the flag before a pause ends. However, after one current GC (initial mark and final mark), I saw segfaults. So my assumption seems not right. Maybe there are some threads that are running during a pause, and the barrier affect those threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a mutator thread is doing IO or running (or blocked in) any native functions that can't touch the heap, the VM may consider it "stopped". So resuming mutators doesn't mean all mutators will see the state change promptly.
This is also the main headache of implementing block_for_gc
because one mutator may get blocked for multiple GC cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring looks different from what @wks and I were thinking of. We thought in the binding, we started to looking at the local flag in the barrier instance to find out if the barrier is active or not. But in the current implementation, the binding is still looking at that global variable, the only change is how the value gets updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The core is using local flags, but the binding still uses the global variable. Checking a local flag may not be faster for bindings, as they need to deference the barrier and then check the value.
I will just leave this as it is. If we do want to implement the binding to check the local flag, I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a related question: the flag is only used for weak reference load barrier, why we can't use the log bit for weak reference load barrier as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a related question: the flag is only used for weak reference load barrier, why we can't use the log bit for weak reference load barrier as well?
The log bit is for maintaining the "snapshot at the beginning". Because a write operation deletes an edge, the old target which was part of the SATB may not be reachable from roots after the writing. So the log bit identifies the first modification of an object, and the write barrier records its old target. But the weak reference load barrier is different. The weak referent was not part of the SATB, but merely loading from the weak field makes the referent strongly reachable. It is analogous to an object allocated during concurrent marking being conservatively considered strongly reachable in this GC. Similarly, a weakly reachable object loaded from a WeakReference is conservatively considered strongly reachable in this GC, too. This is not related to "first modification", so the log bit is not applicable. Concretly, WeakReference.get()
modifies neither the WeakReference
itself nor the referent, but it only modifies the object graph by inserting a strong edge from roots to the referent. So it is inappropriate to change the log bits of the WeakReference
or the referent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing to have a mix of local and global. I think it might be easier/simpler to do the global checking on both ends (mmtk-core and binding).
What will happen if a mutator wants to We may temporarily tell the VM binding that MMTk doesn't currently support forking when using concurrent GC. We may fix it later. One simple solution is postponing the forking until the current GC finishes. The status quo is that only CRuby and Android need forking. But CRuby will not support concurrent GC in a short term. |
You can't let this happen. Either the binding needs to ensure that the mutator waits while concurrent marking is active, or you don't let a concurrent GC happen before forking (ART's method of dealing with this). |
Agreed. Fortunately, the current API doc for /// This function sends an asynchronous message to GC threads and returns immediately, but it
/// is only safe for the VM to call `fork()` after the underlying **native threads** of the GC
/// threads have exited. After calling this function, the VM should wait for their underlying
/// native threads to exit in VM-specific manner before calling `fork()`.
So a well-behaving VM binding shall wait for all the GC worker threads (which are created by the binding via |
Mentioning it before I forget: we need to change the name for the |
#[repr(u8)] | ||
#[derive(Debug, PartialEq, Eq, Copy, Clone, NoUninit)] | ||
pub enum Pause { | ||
Full = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a better name than Full
? I think here it means doing a full GC in one STW (as opposed to doing part of a GC, such as initial mark, final mark, or an incremental part of a GC). But it may be confused with full-heap GC (as opposed to nursery GC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I have the same concern but I could not find a better term so I simply leave full here as a placeholder
src/plan/barriers.rs
Outdated
|
||
fn load_reference(&mut self, _o: ObjectReference) {} | ||
|
||
fn object_reference_clone_pre(&mut self, _obj: ObjectReference) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method only has an blank implementation in BarrierSemantics::object_reference_clone_pre
, and SATBBarrier::object_reference_clone_pre
calls that. What is this method intended for?
src/plan/tracing.rs
Outdated
struct SlotIteratorImpl<VM: VMBinding, F: FnMut(VM::VMSlot)> { | ||
f: F, | ||
// should_discover_references: bool, | ||
// should_claim_clds: bool, | ||
// should_follow_clds: bool, | ||
_p: PhantomData<VM>, | ||
} | ||
|
||
impl<VM: VMBinding, F: FnMut(VM::VMSlot)> SlotVisitor<VM::VMSlot> for SlotIteratorImpl<VM, F> { | ||
fn visit_slot(&mut self, slot: VM::VMSlot) { | ||
(self.f)(slot); | ||
} | ||
} | ||
|
||
pub struct SlotIterator<VM: VMBinding> { | ||
_p: PhantomData<VM>, | ||
} | ||
|
||
impl<VM: VMBinding> SlotIterator<VM> { | ||
pub fn iterate( | ||
o: ObjectReference, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not really need SlotIterator
or SlotIteratorImpl
. Note that in mmtk-core/src/vm/scanning.rs
we already have SlotVisitor
and impl<SL: Slot, F: FnMut(SL)> SlotVisitor<SL> for F {
. That's enough for us to pass a lambda (&mut |slot| { ... }
) to Scanning::scan_object
and use it as the slot_visitor
argument.
I think the lxr
branch introduced this struct to do some kind of filtering, and multiplexed multiple methods of Scanning
, including the newly introduced Scanning::scan_object_with_klass
.
However it is still nice to add a method ObjectReference::iterate_fields
. It's convenient.
src/util/address.rs
Outdated
pub fn iterate_fields<VM: VMBinding, F: FnMut(VM::VMSlot)>(self, f: F) { | ||
SlotIterator::<VM>::iterate(self, f) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since FnMut(VM::VMSlot)
already implements trait SlotVisitor
, we can directly pass it to scan_object
.
pub fn iterate_fields<VM: VMBinding, F: FnMut(VM::VMSlot)>(self, f: F) { | |
SlotIterator::<VM>::iterate(self, f) | |
} | |
pub fn iterate_fields<VM: VMBinding, F: FnMut(VM::VMSlot)>(self, mut f: F) { | |
<VM::VMScanning as Scanning<VM>>::scan_object( | |
VMWorkerThread(VMThread::UNINITIALIZED), | |
self, | |
&mut f, | |
) | |
} |
However, this is still imperfect.
- Not all VMs support
scan_object
. CRuby is one notable exception. If we only intend to visit the value of all (non-null) fields without updating (forwarding) any of them, I suggest we introduce another method in theScanning
trait, such asScanning::enumerate_children
. Otherwise, consider invokingScanning::support_slot_enqueuing
andscan_object_and_trace_edges
, too. - This method doesn't have
tls
. For CRuby, I am implementing object scanning by setting a thread-local variable (which is accessed via thetls
) as a call-back, calling a C function to visit fields, and the C function will call the call-back to let the Rust code visit the field value. But because this method is called from the barrier, it may not always provide aVMWorkerThread
. I suggest the newly introduced methodScanning::enumerate_children
to take aVMThread
instead ofVMWorkerThread
as argument.
In summary, I am also suggesting we introduce a method
pub trait Scanning<VM: VMBinding> {
/// Visit the children of an object.
///
/// This method may be called during mutator time, and is required by concurrent GC. Currently,
/// we don't support concurrent copying GC, so this method can assume no objects are moved by GC
/// while this method is running.
fn enumerate_children(
tls: VMThread,
object: ObjectReference,
child_visitor: &mut impl FnMut(ObjectReference),
);
}
And if we make that change, then ObjectReference::iterate_fields
will call Scanning::enumerate_children
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current "WorkerGoal" mechanism should be able to handle the case where mutators trigger another GC between InitialMark and FinalMark. We can remove WorkPacketStage::Initial
and WorkPacketStage::ConcurrentSentinel
, and use GCWorkScheduler::on_last_parked
to transition the GC state from InitialMark to the concurrent marking to FinalMark and finally finish the GC. See inline comments for more details.
src/scheduler/scheduler.rs
Outdated
notify = true; | ||
} | ||
if notify { | ||
self.wakeup_all_concurrent_workers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wakes up other workers. However, the current GC worker considers itself the "last parked worker". So after the on_gc_finished
function returns, it will clear the current "goal" (goals.on_current_goal_completed()
) and try to handle another "goal" (i.e. a GC request) in self.respond_to_requests(worker, goals)
. Unless the mutator triggers another GC immediatelyu, the current GC worker thread will find that there is no requests, and park itself (return LastParkedResult::ParkSelf;
). If we set MMTK_THREADS=1
, it will never wake up again.
So we need to make some changes to GCWorkScheduler::on_last_parked
so that if this is the end of the InitialMark
STW, the on_last_parked
method should return LastParkedResult::WakeAll
so that all GC workers wake up again, including the current GC worker thread itself.
And we shouldn't clear the current "worker goal" if it is the end of InitialMark. The GC workers should still be working towards the "GC" goal. But it needs to be a bit stateful, knowing whether it is (1) during a "full GC", (2) during initial mark, (3) during final mark, or (4) between initial mark and final mark (we may just call it "concurrent tracing". As long as the GC workers are still working on a goal (regardless of concrete state), it will not accept other GC requests or requests for forking, which automatically solves the forking problem.
And if concurrent tracing finishes again (all the postponed work packets and their children are drained), the last parked worker will reach on_last_parked
again. This time, it can immediately start scheduling the FinalMark
stage (or should we call it a "super stage" to distinguish it from work bucket stage, or just call it an STW). When FinalMark
finishes, the GC really finishes. GC workers can now accept requests for another GC or requests for forking.
@@ -250,3 +260,82 @@ impl<S: BarrierSemantics> Barrier<S::VM> for ObjectBarrier<S> { | |||
} | |||
} | |||
} | |||
|
|||
pub struct SATBBarrier<S: BarrierSemantics> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a pre-write ObjectBarrier
. The current ObjectBarrier
only implements the post-write functions, and this implementation implements the pre-write functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't plan to do this refactoring in this PR.
It looks like we keep newly allocated objects alive by marking the lines, To properly support VO bits, we need to do two things:
There are multiple ways to know if a line only contains new objects or old objects.
|
"It looks like we keep newly allocated objects alive by marking the lines, but not marking those objects." This is not true. Both lines and every word within that line are marked. So if one blindly copy mark bits to vo bits, then vo bit will be 1 even if that address is not an object |
I think we can just mark lines, and also mark each individual object. The bulk set only works for side mark bits anyway. Let's not get things entangled and complicated. |
The problem is that we do not want to do the check in the fast path. If we want to mark each individual object, then in the allocation fast-path, we need to check if concurrent marking is active and then set the mark bit. While the current bulk setting way only set the mark bit in the slow-path |
I forgot the SATB barrier. It will remember B when we remove the edge |
In your case, B will be captured by the SATB barrier, it will not be considered as dead. There is no need to scan those newly allocated objects because any children of those newly allocated objects must have been alive in the snapshot and thus is guaranteed to be traced |
The mark bit could be in the header, and we have to set it per object if it is in the header. We can differentiate header mark bit and side mark bit, and deal with each differently. But bulk setting mark bits is still a bit hacky -- and this is why we would have issues with VO bits. VO bits copies from mark bits, assuming mark bits is only set for each individual object. |
…iority queue for unconstrained.
} | ||
|
||
fn object_probable_write_slow(&mut self, obj: ObjectReference) { | ||
crate::plan::tracing::SlotIterator::<VM, _>::iterate_fields(obj, self.tls.0, |s| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this scanning object? Do we need to call post_scan_object
here? @tianleq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a write barrier. It's not being scanned by the GC in the sense of we've marked it gray, etc. So I don't think it's sound to call post_scan_object
here.
6a52d45
to
c79fe8e
Compare
@wks I don't have other intended changes for this PR. You can take a look. Also feel free to commit to the PR. |
It is currently a no-op and not needed for SATB barrier. It was intended for implementing OpenJDK's object cloning pre barrier. We will reintroduce it into mmtk-core when we implement a plan that needs such a barrier, and we need to design our API in a VM-agnostic way.
Currently |
We instead use a boolean field `should_do_full_gc` to tell if the user or the `collection_required` method think it is time to do full GC. We let `schedule_collection` decide the actual pause type.
Currently it is unsafe to skip FinalMark and go directly to Full GC. We add a comment for that, and postpone full GC request to the next GC after FinalMark.
I changed |
Yes. I did it. I refactored the code, removed the |
pub struct StopMutators<C: GCWorkContext> { | ||
/// If this is true, we skip creating [`ScanMutatorRoots`] work packets for mutators. | ||
/// By default, this is false. | ||
skip_mutator_roots: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tianleq suggests that we should just allow a closure as the mutator visitor, instead of having some flags.
This PR adds a concurrent Immix plan.