Skip to content

Scalene occasionally hangs at startup on macOS hosted CI runners #1030

@emeryberger

Description

@emeryberger

Summary

On GitHub Actions macos-latest hosted runners, Scalene occasionally hangs at startup when launched as a subprocess with memory profiling enabled. The subprocess produces no stdout/stderr before it is SIGKILL'd by the parent's timeout. The hang has not been reproducible locally on macOS 3.12 (passes 10/10 in <3s each), only on hosted runners.

Symptom

A subprocess invocation like:

python -m scalene run --json --outfile out.json some_script.py

never returns. After the parent's timeout (we use 60s in tests), subprocess.TimeoutExpired fires and the runner reports returncode: -9 (SIGKILL). stdout_seq is empty in the captured exception, indicating Scalene wrote nothing before hanging.

Where it has been observed

What we ruled out

Plausible root causes (none verified)

  1. A race during DYLD_INSERT_LIBRARIES injection of libscalene.dylib.
  2. A sys.monitoring setup race on Python 3.12 specifically.
  3. A native heap hook ↔ GIL deadlock under Python 3.12's adaptive specialization.
  4. macOS hosted runner kernel quirks around library injection / signal delivery.

Workaround currently in place

Commit 4e3b318 in PR #1029 wraps subprocess.run calls in test/test_tracer.py with a retry helper that catches TimeoutExpired, retries up to 3× at 60s each, and skipTests rather than fails when every attempt hangs. This keeps CI green on transient hangs, but masks the underlying bug — a user actually running Scalene on macOS could hit the same hang.

Suggested next steps

  • Add an opt-in SCALENE_DEBUG=1 env var that emits async-signal-safe stderr breadcrumbs at each major startup phase (preload init, pywhere import, sys.monitoring registration, signal handler install). Next time CI hangs, the logs will pinpoint the stuck phase.
  • Or: set up a tmate-enabled CI job that pauses on failure so we can SSH into a hung runner and attach lldb / sample to the Scalene process.
  • If reproducible, the fix likely lives in src/source/libscalene.cpp or scalene/scalene_tracer.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions