Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/eliminating traffic on root partition #6167

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

davidporter-id-au
Copy link
Contributor

@davidporter-id-au davidporter-id-au commented Jul 16, 2024

What changed?

This prevents normal traffic (polls from workers, tasks dispatched from history) from being directly dispatched to the root partition. Both pollers and tasks dispatched by matching through async-match are expected to be the only remaining traffic on the root partition.

Why?

The root partition is a contention point under heavy load if there are too many partitions relative to the number of pollers (they all end up falling back to polling on the root partition and matching has to forward tasks to the root partition. This contention is:

  • Made worse by normal traffic also being redirected to the root at a frequency of 1/N write partitions
  • Significantly more confusing to debug when there are multiple sources of traffic on that partition.

The underlying justification is one of trying to make an overly complex part of the system more conceptually simple, therefore making it significantly easier to debug, diagnose issues and reason about.

If this change lands, then:

  • All traffic on the root partition will only ever be the result of forwarding
  • The presence therefore, of any meaningful traffic on the root partition for any extended period of time can be taken to be a clear indication of a system configuration problem, it's easy to alert on, its easy to understand.

How did you test it?

Potential risks

This change reduces the number of all production partitions by 1, so this risks slightly worsening any contended tasklists

Release notes

Documentation Changes

Copy link

codecov bot commented Jul 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.67%. Comparing base (3248455) to head (3780895).
Report is 2 commits behind head on master.

Additional details and impacted files
Files Coverage Δ
client/matching/loadbalancer.go 89.74% <100.00%> (+19.74%) ⬆️

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3248455...3780895. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant