cpu/fe310: don't call thread_yield when sched_active_thread is invalid#12109
cpu/fe310: don't call thread_yield when sched_active_thread is invalid#12109kaspar030 merged 1 commit intoRIOT-OS:masterfrom
Conversation
As the comment above cpu_switch_context_exit notes: sched_active_thread is not valid when cpu_switch_context_exit() is called. Unfortunately, thread_yield(), which is called directly by cpu_switch_context_exit(), uses sched_active_thread possibly resulting in a null pointer dereference. Solution: Trigger a software interrupt to perform a context switch and let sched_run() determine the next valid thread from there.
|
CC: @kenrabold (wrote the initial hifive1 support) |
|
I see what you are changing here and agree that thread_yield() should not be called cpu_switch_context_exit(). However, there is something else going on with the context changing code now because thread tests that were passing previously (eg. thread_msg_block_race, thread_race, etc...) are now not working. This PR is fine by me to address the immediate issue raised, but there is still a problem that needs fixing. I'm just not sure what that is at the moment. |
|
Interesting, |
|
However, if I am not mistaken the test also fail on current git HEAD 999fffd without the changes proposed here. |
|
I dug into this issue deeper and found that the reason for the Your simulator would not have had this issue as I'm sure you are not introducing a latency in the simulation of interrupts. To fix this, I added 8 nops after invoking the SW interrupt. I also addressed the thread_yield call and the issue referenced in #12110 My changes are here: https://github.com/kenrabold/RIOT/blob/pr_thread_yield_higher/cpu/fe310/cpu.c You can update your PR with these changes, or I can request the PR to pull in these fixes. Thanks for finding this and calling it out. Good bug |
|
Thanks for investigating this further and coming up with a fix.
Either way is fine with me. Maybe just submit a separate PR which depends on this one? |
|
@miri64 since this was ack'ed by the person who originally added the hifive board support is there anything that prevents this from getting merged? |
I would call myself even less of an expert, especially when it comes to cpu/board internals, so not sure why I was tagged. @kaspar030 @aabadie can one of you maybe test this? The fix looks fine to my limited understanding. |
|
I'll run some tests. |
Contribution description
As the comment above
cpu_switch_context_exitnotes:Unfortunately,
thread_yield(), which is called directly bycpu_switch_context_exit(), usessched_active_threadpossibly resultingin a null pointer dereference.
Solution: Trigger a software interrupt to perform a context switch and
let
sched_run()determine the next valid thread from there.Testing procedure
I personally noticed this while trying to use RIOT with a riscv simulator. However, since this might as well be a bug in the simulator is probably best to "test this" by reading the code.
Disclaimer: I am not a riscv expert, but I think it's pretty obvious that calling
thread_yield()with an invalidsched_active_threadis a bad idea, unless you want to rely on the cpu not raising a trap/exception.