-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] the semantics of enter_critical_section is unclear #14593
Comments
@anchao could you please help to shed some light here? |
In the current implementation, Here are a few PRs under review, @acassis @yamt |
It can be understood like this. However, due to the complex of the enter_critical_section function, it can encompass any area, The current recommendation is that, if it is not related to protecting the tcb/group
enter_critical_section only disable local CPU interrupts and cannot disable remote CPU interrupts. |
what do you recommend to use instead? how do you think about introducing a bit higher-level primitives for the kernel? eg. solaris-like adaptive mutex
so, the current semantics of enter_critical_section is:
is it right? |
Based on the program logic, deciding which synchronization strategy to use,
I don't think adaptive mutex is a good idea. Using general locks strategy can easily lead to abuse and a decline in code quality. Just like critical sections in NuttX, which are used everywhere
yes |
Yes
Yes
Yes
The release/re-acquire logic is a bit hard to follow at least for me. I'm stuck with my SMP integration to MPFS + BUILD_KERNEL because I'm getting random crashes / DEBUGASSERT()s from irq_csection() (as well as deadlocks too). There is something odd going on with this logic. EDIT: It is somehow tied to the fact that we have two "types" of context switches; From ISR and from thread. There is some kind of a hole in the recursive irqlock handling related to this. The issue might be the fact that we have a task specific |
yes
when an interrupt occurs, g_cpu_nestcount is used as the recursive counter because context switching may happen and this_task() will change, and tcp->irqcount cannot be used. |
i feel a bare spinlock is too primitive for many cases.
although it doesn't need to be adaptive mutex (it was just an example)
well, a typical implementation of adaptive mutex would be far better than a system global critical section |
Yes this much I was able to figure out when I tried to remove g_cpu_nestcount. There is still a hole somewhere in the acquire / release logic as I can see random debugasserts and deadlocks which happen when:
And very likely a context switch is what triggers the issue. This happens more often under high load scenarios, like system bootup. In our case the system boot uses a lot of CPU, it also spawns a lot of processes that start, run and exit almost immediately. I'll keep investigating. |
Description / Steps to reproduce the issue
https://github.com/apache/nuttx/blob/master/Documentation/implementation/critical_sections.rst
as users of
enter_critical_section
don't seem to have a way to limit the scope,i guess it implies
enter_critical_section
works as the single global lock covering the whole system.is it right?
also, to me it isn't clear if
enter_critical_section
is supposed to disable interrupts on the remote CPUs.the comment in
irq_csection.c
sayson the other hand, other comment in the file says
they seem contradicting each other.
On which OS does this issue occur?
[OS: Mac]
What is the version of your OS?
macOS 14.7
NuttX Version
master
Issue Architecture
[Arch: all]
Issue Area
[Area: Api], [Area: Kernel]
Verification
The text was updated successfully, but these errors were encountered: