Skip to content

Improve critical sections and stop-the-world interoperability #144513

@colesbury

Description

@colesbury

Feature or enhancement

Currently, it's not safe to use critical sections within stop-the-world (STW) pauses because it risks deadlock. The deadlock risk isn't obvious and actual deadlocks happens infrequently, which makes our unit tests less effective at catching this sort of bug.

Deadlock

Normally, when a thread pauses for a stop-the-world event, it releases all the critical sections that it holds when the thread state is detached (_PyThreadState_Detach). The problem is that when locks are contended for a certain duration (TIME_TO_BE_FAIR_NS), ownership is handed off directly to a waiting thread rather than being released. This can happen after the waiting thread detaches and reaches a safe point for a stop-the-world event.

Proposal

The slow paths _PyCriticalSection_BeginSlow and _PyCriticalSection2_BeginSlow should check if the interpreter is in a stop-the-world pause. If so, we should return without acquiring the lock like we do in the "optimisation for locking the same object recursively".

I think we only care about per-interpreter stop-the-world pauses here. We only use cross-interpreter STW pauses (_PyEval_StopTheWorldAll) in a few places, like before os.fork().

cc @mpage @Yhg1s

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions