Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q&A] High priority thread restricted by low priority thread? #1626

Open
bluelasers opened this issue Feb 28, 2024 · 6 comments
Open

[Q&A] High priority thread restricted by low priority thread? #1626

bluelasers opened this issue Feb 28, 2024 · 6 comments

Comments

@bluelasers
Copy link

I was looking into the code for possible points of concern. To me this looks like a possible deadlock which results in a stall of the high priority thread. The front ground thread is capable of acquiring the lock and losing the CPU(s). What happens to the background thread during this?

FrameCanvas *SwapOnVSync(FrameCanvas *other, unsigned frame_fraction) {
MutexLock l(&frame_sync_);
FrameCanvas *previous = current_frame_;
next_frame_ = other;
requested_frame_multiple_ = frame_fraction;
frame_sync_.WaitOn(&frame_done_);
return previous;
}

{
MutexLock l(&frame_sync_);
// Do fast equality test first (likely due to frame_count reset).
if (frame_count == requested_frame_multiple_
|| frame_count % requested_frame_multiple_ == 0) {
// We reset to avoid frame hick-up every couple of weeks
// run-time iff requested_frame_multiple_ is not a factor of 2^32.
frame_count = 0;
if (next_frame_ != NULL) {
current_frame_ = next_frame_;
next_frame_ = NULL;
}
pthread_cond_signal(&frame_done_);
}
}

It would seem to me that you would try the lock and pass it over if held. Can you create critical section in user space?

bluelasers added a commit to bluelasers/rpi-rgb-led-matrix that referenced this issue May 29, 2024
@hzeller
Copy link
Owner

hzeller commented Jun 29, 2024

Can you explain where you see a deadlock ? A deadlock is if someone is blocking a mutex a and waits for a mutex b; and the someone else is blocking mutex b but waits for mutex a to succeed.

We don't have this situation here: just one mutex used in two critical sections that should be run exclusively. Yes you can temporarily use CPU in one of these, elongating the time the other has to wait, but you never stall.

Also note that each of the sections is short, does not do any external IO and just swaps around variables.

@bluelasers
Copy link
Author

OS cannot be predicted. If you lose the CPU you lose the background thread. Basically this means the multicore is single threaded. The cache can get cold during this time. Depending on the version of hardware this may matter more or less.

You are SMT program currently and I am trying to get you back to where you tried to be. I would recommend removing all but core zero from the scheduler. I will just leave you with the knowledge of that option as I am not sure you will agree with it.

By the way there is another way to write this which is more SMT. This works around this issue completely, but I have not written a full version of it. So I just leave you with the knowledge of its existence only, as it would be a good size rewrite.

@hzeller
Copy link
Owner

hzeller commented Jun 29, 2024

What you describe is essentially priority inversion, and yes, it can be problematic if there is only one core and a saturated realtime thread.

And yes, this will always be a theoretical issue (but not in practice).

@bluelasers
Copy link
Author

My concern is outlined in #1468. This issue applies on multicore also. We can mask out the background threads priority. This stalls the panels multiplexing. We may be forced to set out a full tick randomly. That is a relatively long period of time for multiplexing. Certain workloads may be more prone to this issue. (Multithreading for example, see #1678.)

Our only hope is for the OS to recognize the dependency. Which allows the kernel to pass the priority of the background thread to the drawing thread briefly. Otherwise we need to increase the priority of the drawing thread beyond a desired threshold to work around this issue.

@bluelasers
Copy link
Author

I recommend against this, but we could document and close.

@hzeller
Copy link
Owner

hzeller commented Jul 10, 2024

SwapOnVSync() only swaps a variable so does not do expensive work holding up the critical section. If it gets its CPU time taken away while it is doing that, then this is to be expected. The only other more important work is the update thread feeding the LEDs.
If that stops working e.g. because it waits for a new frame to arrive, then CPUs are free (even on a single core) for the suspended SwapOnVSync() to proceed. So it will work all as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants