Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-124567: Reduce overhead of cycle GC. #124717

Closed
wants to merge 3 commits into from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Sep 27, 2024

This PR makes the following changes to cycle GC, to reduce the overhead observed in #124567:

  • Does not form the transitive closure of the young generation before collecting it. Avoids scanning too much of the heap that is likely reachable.
  • Performs an amount of work proportional to the young collection in the incremental collection, avoiding n**2 performance on large heaps (this was a regression w.r.t. 3.12)
  • At the start of each full scan, mark all objects reachable from sys.modules, so they are not scanned this scan.
  • Computes the delta in heap size during incremental collection, and accelerates incremental collection if the heap is growing
  • Adds a _testinternalcapi.get_heap_size() function for more precise testing of the GC

Before merging, I need to:

  • Get performance numbers and stats for this version
  • Likewise for an alternative version that does form the transitive closure of the young generation, but only does so after marking reachable objects.
    I still need benchmarks and stats for this before merging.

@Privat33r-dev
Copy link
Contributor

As a quick preliminary check, I would suggest running doc tests, since they indicated the issue at first and caused further findings (see #118891 for details).

Currently, it appears to me that the PR achieves the opposite of its purpose.

Doctest on main branch takes ~12-15 minutes
With PR it takes >40m (still running): https://github.com/Privat33r-dev/cpython/actions/runs/11082906901/job/30796516669?pr=4
Main (before PR): https://github.com/Privat33r-dev/cpython/actions/runs/11083066610/job/30796883045?pr=5

@AA-Turner
Copy link
Member

The test at https://github.com/Privat33r-dev/cpython/actions/runs/11082906901/job/30796516669?pr=4 was killed by the 60 minute timeout, less than half-way through the build (source files not fully read). This PR isn't running documentation tests as no files in Doc/ have changed and there's no NEWS entry currently.

A

@Privat33r-dev
Copy link
Contributor

This PR isn't running documentation tests as no files in Doc/ have changed.

That's how I triggered it in my repo. I suggest to add/remove space in Docs in this PR as well, so we can see the end result of the changes.

I probably should've added further context for clarity. I used this PR as a base to make a PR in my repository, with 1 more commit from me (space change in docs), so Doctest will start automatically. It took more than 1 hour and stopped halfway through, because the job had 1 hour limit set.
Privat33r-dev#4

At the same time, I made a control Doctest in my repo on main branch + minor doc change. It took 13 m:
Privat33r-dev#5

@@ -1467,7 +1487,12 @@ gc_collect_increment(PyThreadState *tstate, struct gc_collection_stats *stats)
gc_list_validate_space(&survivors, gcstate->visited_space);
gc_list_merge(&survivors, visited);
assert(gc_list_is_empty(&increment));
gcstate->work_to_do += gcstate->heap_size / SCAN_RATE_DIVISOR / scale_factor;
Py_ssize_t delta = (gcstate->heap_size - gcstate->prior_heap_size)*3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary operations, where two operands require spaces, may be more standard.
(gcstate->heap_size - gcstate->prior_heap_size) * 3

@markshannon
Copy link
Member Author

Closing this, as we don't need an urgent fix (3.13 goes back to generational collection)

Longer term, I expect reduced reference counting operations to make faster-cpython/ideas#693 viable.

The pre-scan to find live objects is definitely worth further investigation, though.

@markshannon markshannon closed this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants