Skip to content

cpu/cortexm: "free" SVC#13825

Merged
fjmolinas merged 3 commits intoRIOT-OS:masterfrom
kaspar030:cortexm_free_svc
Jun 16, 2020
Merged

cpu/cortexm: "free" SVC#13825
fjmolinas merged 3 commits intoRIOT-OS:masterfrom
kaspar030:cortexm_free_svc

Conversation

@kaspar030
Copy link
Contributor

@kaspar030 kaspar030 commented Apr 6, 2020

Contribution description

Currently, the Cortex-M "supervisor call exception" SVC (think system call entry point) is completely blocked by being defined to be tail-called by PendSV, in order to call sched_run() and restore the context from sched_active_thread. This makes it unusable for any other use but cpu_switch_context_exit().

This PR refactors PendSV to include the previous SVC routine (run scheduler, restore thread context), but before that, only conditionally stores the current context only if sched_active_thread is non-null.

An SVC dispatch function is added. SVC 1 maps to the original function by handing over to PendSV.

This is the first step to implement system calls using SVC, which can now be used in combination with the MPU to provide some memory protected "user space" that still interacts with the kernel.

Performance wise, context switches are slightly faster on Cortex-M3 onwards, and slightly slower on CM0(+), but both times less than 0.5%. Crude measuring shows ~50b increase in code size.

I've tested this only on particle-xenon (Cortex-M4F) and nucleo-f072rb (Cortex-M0+).

Testing procedure

Being quite central with a lot of opportunities for weird error cases, this should get somewhat extensive review and testing, especially anything involving context switching on different Cortex-M.

Issues/PRs references

@kaspar030 kaspar030 added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: ARM Platform: This PR/issue effects ARM-based platforms CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Area: cpu Area: CPU/MCU ports labels Apr 6, 2020
@kaspar030
Copy link
Contributor Author

I've pushed a commit that makes the dispatching optional on selecting the module cortexm_svc, basically restoring the previous default behavior.

@fjmolinas fjmolinas self-assigned this Apr 28, 2020
Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, some nitpicks below. Will start testing.

Comment on lines +290 to +294
/* skip context saving if sched_active_thread == NULL */
"ldr r1, =sched_active_thread \n" /* r1 = &sched_active_thread */
"ldr r1, [r1] \n" /* r1 = sched_active_thread */
"cmp r1, #0 \n" /* if r1 == NULL: */
"beq context_restore \n" /* goto context_restore */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping context switching depending on sched_active_thread should go into its own commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will re-shuffle when squashing

@fjmolinas
Copy link
Contributor

fjmolinas commented Apr 28, 2020

Bench-marking tests/bench_thread_yield_pingpong'

core master pr pr + cortexm_svc
m0+ (usb-kw41z) 96632 95862 95862
m0 (nucle0-f091rc) 85683 85076 85076
m3 (openmote-b) 70123 70741 70741
m3 (iotlab-m3) 161035 158550 158550
m4 (frdm-k64f) 150376 149254 149254
m7 (nucleo-f767zi) 291104 289156 289156

In general they are all running faster. Only openmote-b had a .8% increase.

@fjmolinas
Copy link
Contributor

tests/thread_cooperation still working correctly for the above mentioned boards, any other testing suggestion? I could run a subset of tests on many boards?

@kaspar030
Copy link
Contributor Author

kaspar030 commented Apr 28, 2020

In general they are all running faster. Only openmote-b had a .8% increase.

Aehm, the benchmark counts context switch per second, so lower is better ;) edit sorry for adding my own confusion, higher numbers are better.

I'm seeing a slight improvement on nucleo-f401re.

@fjmolinas
Copy link
Contributor

Aehm, the benchmark counts context switch per second, so lower is better ;) edit sorry for adding my own confusion, higher numbers are better.

Well then iotlab-m3 is slower.

@kaspar030
Copy link
Contributor Author

Hm, I'm seeing different behavior gcc vs llvm, and gcc versions in our toolchain vs. 9.3.0, on both nucleo-f401re and samr21-xpro. llvm is always slightly slower on master (both current llvm and the one in our container). gcc in our container is slightly slower with this pr. current gcc is slightly faster with this pr. :-/

@fjmolinas
Copy link
Contributor

Hm, I'm seeing different behavior gcc vs llvm, and gcc versions in our toolchain vs. 9.3.0, on both nucleo-f401re and samr21-xpro. llvm is always slightly slower on master (both current llvm and the one in our container). gcc in our container is slightly slower with this pr. current gcc is slightly faster with this pr. :-/

When using llvm on iotlab-m3 I get more context switches on master 176432 vs 173876 on this PR.

@kaspar030
Copy link
Contributor Author

So, with llvm, PR is faster. With gcc, this PR is generally slower. It might get faster with PR on newer gcc releases.

As we don't have an actual use case ATM, I guess we have to wait. No need to slow down context switches for nothing.

@kaspar030 kaspar030 requested a review from cgundogan as a code owner June 5, 2020 12:47
@kaspar030
Copy link
Contributor Author

@leandrolanzieri @jia200x can you take a look if I added the feature to Kconfig correctly?

@leandrolanzieri
Copy link
Contributor

@leandrolanzieri @jia200x can you take a look if I added the feature to Kconfig correctly?

Yes, LGTM!

@kaspar030
Copy link
Contributor Author

This is now a requirement for #14224. IMO that makes the slight performance drop for some compilers / MCUs acceptable...

@maribu
Copy link
Member

maribu commented Jun 8, 2020

IMO that makes the slight performance drop for some compilers / MCUs acceptable...

Actually, now is the perfect time to add some features coming at the price of reduced performance: The inline-able IRQ API improved the performance so much, that the next release is still going to be faster than the current one. So users won't notice 😈

Copy link
Contributor

@fjmolinas fjmolinas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is an agreement on that this feature is useful, and that its sideffects have been mitigated by the inline-able irq api. <1% decrease here vs 22% in the inline irq api. Also running the same bench mark on #14224 vs this PR also yields a ~1% increase on context switches, so IMO there is nothing blocking this anymore.

The code hasn't changed since the last time I tested except for the kconfig FEATURE part. Since this touches core I would still like to get a second ACK here, maybe @maribu @bergzand?

@fjmolinas fjmolinas added CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: disable test cache If set, CI will always run all tests regardless of whether they have been run successfully before labels Jun 9, 2020
@fjmolinas
Copy link
Contributor

I would like to see a run of all tests here as well.

@kaspar030 kaspar030 added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jun 9, 2020
@fjmolinas
Copy link
Contributor

Test failures are in esp32, so unrelated, I'll re-trigger without them

@fjmolinas fjmolinas added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: disable test cache If set, CI will always run all tests regardless of whether they have been run successfully before labels Jun 9, 2020
@benpicco
Copy link
Contributor

Now a rebase is needed.

@fjmolinas
Copy link
Contributor

No one has said anything for while so GO!

@fjmolinas fjmolinas merged commit b041221 into RIOT-OS:master Jun 16, 2020
@kaspar030 kaspar030 deleted the cortexm_free_svc branch June 16, 2020 08:31
@kaspar030
Copy link
Contributor Author

No one has said anything for while so GO!

Thanks for reviewing!

@miri64 miri64 added this to the Release 2020.07 milestone Jun 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants