[Q&A] High priority thread restricted by low priority thread? #1626

bluelasers · 2024-02-28T16:41:52Z

I was looking into the code for possible points of concern. To me this looks like a possible deadlock which results in a stall of the high priority thread. The front ground thread is capable of acquiring the lock and losing the CPU(s). What happens to the background thread during this?

rpi-rgb-led-matrix/lib/led-matrix.cc

Lines 224 to 231 in a3eea99

    
           FrameCanvas *SwapOnVSync(FrameCanvas *other, unsigned frame_fraction) { 
        
             MutexLock l(&frame_sync_); 
        
             FrameCanvas *previous = current_frame_; 
        
             next_frame_ = other; 
        
             requested_frame_multiple_ = frame_fraction; 
        
             frame_sync_.WaitOn(&frame_done_); 
        
             return previous; 
        
           }

rpi-rgb-led-matrix/lib/led-matrix.cc

Lines 173 to 187 in a3eea99

    
           { 
        
             MutexLock l(&frame_sync_); 
        
             // Do fast equality test first (likely due to frame_count reset). 
        
             if (frame_count == requested_frame_multiple_ 
        
                 || frame_count % requested_frame_multiple_ == 0) { 
        
               // We reset to avoid frame hick-up every couple of weeks 
        
               // run-time iff requested_frame_multiple_ is not a factor of 2^32. 
        
               frame_count = 0; 
        
               if (next_frame_ != NULL) { 
        
                 current_frame_ = next_frame_; 
        
                 next_frame_ = NULL; 
        
               } 
        
               pthread_cond_signal(&frame_done_); 
        
             } 
        
           }

It would seem to me that you would try the lock and pass it over if held. Can you create critical section in user space?

hzeller · 2024-06-29T15:41:54Z

Can you explain where you see a deadlock ? A deadlock is if someone is blocking a mutex a and waits for a mutex b; and the someone else is blocking mutex b but waits for mutex a to succeed.

We don't have this situation here: just one mutex used in two critical sections that should be run exclusively. Yes you can temporarily use CPU in one of these, elongating the time the other has to wait, but you never stall.

Also note that each of the sections is short, does not do any external IO and just swaps around variables.

bluelasers · 2024-06-29T16:41:40Z

OS cannot be predicted. If you lose the CPU you lose the background thread. Basically this means the multicore is single threaded. The cache can get cold during this time. Depending on the version of hardware this may matter more or less.

You are SMT program currently and I am trying to get you back to where you tried to be. I would recommend removing all but core zero from the scheduler. I will just leave you with the knowledge of that option as I am not sure you will agree with it.

By the way there is another way to write this which is more SMT. This works around this issue completely, but I have not written a full version of it. So I just leave you with the knowledge of its existence only, as it would be a good size rewrite.

hzeller · 2024-06-29T16:58:18Z

What you describe is essentially priority inversion, and yes, it can be problematic if there is only one core and a saturated realtime thread.

And yes, this will always be a theoretical issue (but not in practice).

bluelasers · 2024-07-10T02:32:27Z

My concern is outlined in #1468. This issue applies on multicore also. We can mask out the background threads priority. This stalls the panels multiplexing. We may be forced to set out a full tick randomly. That is a relatively long period of time for multiplexing. Certain workloads may be more prone to this issue. (Multithreading for example, see #1678.)

Our only hope is for the OS to recognize the dependency. Which allows the kernel to pass the priority of the background thread to the drawing thread briefly. Otherwise we need to increase the priority of the drawing thread beyond a desired threshold to work around this issue.

bluelasers · 2024-07-10T02:45:13Z

I recommend against this, but we could document and close.

hzeller · 2024-07-10T03:18:16Z

SwapOnVSync() only swaps a variable so does not do expensive work holding up the critical section. If it gets its CPU time taken away while it is doing that, then this is to be expected. The only other more important work is the update thread feeding the LEDs.
If that stops working e.g. because it waits for a new frame to arrive, then CPUs are free (even on a single core) for the suspended SwapOnVSync() to proceed. So it will work all as expected.

bluelasers added a commit to bluelasers/rpi-rgb-led-matrix that referenced this issue May 29, 2024

hzeller#1626 Use try lock method to avoid stalling real time IO thread.

9098717

hzeller mentioned this issue Jul 10, 2024

#1626 Use try lock method to avoid stalling real time IO thread. #1659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q&A] High priority thread restricted by low priority thread? #1626

[Q&A] High priority thread restricted by low priority thread? #1626

bluelasers commented Feb 28, 2024

hzeller commented Jun 29, 2024

bluelasers commented Jun 29, 2024

hzeller commented Jun 29, 2024

bluelasers commented Jul 10, 2024

bluelasers commented Jul 10, 2024

hzeller commented Jul 10, 2024

[Q&A] High priority thread restricted by low priority thread? #1626

[Q&A] High priority thread restricted by low priority thread? #1626

Comments

bluelasers commented Feb 28, 2024

hzeller commented Jun 29, 2024

bluelasers commented Jun 29, 2024

hzeller commented Jun 29, 2024

bluelasers commented Jul 10, 2024

bluelasers commented Jul 10, 2024

hzeller commented Jul 10, 2024