Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Round robin scheduling when SMP is enabled causes crashes #14699

Closed
1 task done
pussuw opened this issue Nov 8, 2024 · 4 comments
Closed
1 task done

[BUG] Round robin scheduling when SMP is enabled causes crashes #14699

pussuw opened this issue Nov 8, 2024 · 4 comments
Labels
Arch: all Issues that apply to all architectures Area: Kernel Kernel issues OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working

Comments

@pussuw
Copy link
Contributor

pussuw commented Nov 8, 2024

Description / Steps to reproduce the issue

The round robin scheduling logic causes crashes via:

bool nxsched_remove_readytorun(FAR struct tcb_s *tcb)
{
if (tcb->task_state == TSTATE_TASK_RUNNING)
{
DEBUGASSERT(tcb->cpu == this_cpu());
nxsched_remove_running(tcb);
return true;
}

The crash occurs when the task being pre-empted is running on a different CPU / core. The call tree is as follows:

_assert() at assert.c:906 0xa001f690	
__assert() at lib_assert.c:38 0xa0005d4a	
nxsched_remove_readytorun() at sched_removereadytorun.c:291 0xa0027f2c	
nxsched_reprioritize_rtr() at sched_reprioritizertr.c:67 0xa00214f2	
nxsched_process_roundrobin() at sched_roundrobin.c:141 0xa003c800	
nxsched_cpu_scheduler() at sched_processtimer.c:77 0xa003c424	
nxsched_process_scheduler() at sched_processtimer.c:134 0xa003c424	
nxsched_process_timer() at sched_processtimer.c:189 0xa003c424	

This happens when the system tick advances. An arbitrary CPU will handle the timer interrupt and advance the system ticker. It will then try to run the round robin scheduling logic for every CPU.

flags = enter_critical_section();
/* Perform scheduler operations on all CPUs */
for (i = 0; i < CONFIG_SMP_NCPUS; i++)
{
nxsched_cpu_scheduler(i);
}
leave_critical_section(flags);
}

Pre-empting a task on another CPU directly will obviously not work -> crash.

What do you think @hujun260 should this be handled by your SMP call logic, or something else ?

On which OS does this issue occur?

[OS: Linux]

What is the version of your OS?

Ubuntu

NuttX Version

master

Issue Architecture

[Arch: all]

Issue Area

[Area: Kernel]

Verification

  • I have verified before submitting the report.
@pussuw pussuw added the Type: Bug Something isn't working label Nov 8, 2024
@github-actions github-actions bot added Arch: all Issues that apply to all architectures Area: Kernel Kernel issues OS: Linux Issues related to Linux (building system, etc) labels Nov 8, 2024
@hujun260
Copy link
Contributor

hujun260 commented Nov 8, 2024

Please verify again with the latest code(apache/master)

@pussuw
Copy link
Contributor Author

pussuw commented Nov 8, 2024

Oh, I did not notice you had a patch for this already. I'll try it on Monday but I'm sure this bug report is not valid anymore.

@pussuw
Copy link
Contributor Author

pussuw commented Nov 8, 2024

For reference: fixed by #14611

@pussuw pussuw closed this as completed Nov 8, 2024
@pussuw
Copy link
Contributor Author

pussuw commented Nov 8, 2024

The issue is verified to be fixed by #14611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: all Issues that apply to all architectures Area: Kernel Kernel issues OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants