-
Notifications
You must be signed in to change notification settings - Fork 13
asm clean up and stack size reduction #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR cleans up and optimises the VX4B filter assembly implementations in lib_xcore_math, primarily by reducing stack usage, and adjusts tests to match the VX4B numerical behaviour and fix a shift-handling bug in the FIR S32 tests.
Changes:
- Reduce
NSTACKWORDSand adjust stack layouts for VX4B filter assembly routines (filter_fir_s32,filter_fir_s16, biquad variants, and support routines) to minimise stack usage and remove old tool-annotation clutter. - Fix the S32 FIR test’s expected-value computation for
shift == 0and relax several S16 FIR tests on__VX4B__to allow small LSB differences. - Clean up VX4B VPU usage in various filter assembly files (use of
lafor constants, updated mask and pointer handling, consistent.globl/metadata directives).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/filter_tests/src/filter/test_filter_fir_s32.c | Adjusts expected-value logic to treat shift == 0 as a non-shift (avoiding undefined (1LL << -1)), fixing the S32 FIR test bug. |
| tests/filter_tests/src/filter/test_filter_fir_s16.c | For __VX4B__, replaces exact equality checks with small INT16_WITHIN tolerances to account for VX4B rounding/saturation differences. |
| lib_xcore_math/src/arch/vx4b/filter/vect_s32_convolve_valid.S | Reduces stack usage and removes translation warnings while keeping the convolution loop and tail handling logic intact. |
| lib_xcore_math/src/arch/vx4b/filter/push_sample_up_s16.S | Shrinks the stack frame, fixes prologue/epilogue register save/restore offsets, and updates comments and branch usage for VX4B. |
| lib_xcore_math/src/arch/vx4b/filter/push_sample_down_s16.S | Similar stack-frame reduction and cleanup for the “push sample down” helper, with corrected save/restore and VPU loop structure. |
| lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S | Reduces stack usage and cleans up the VPU setup, tail processing, and accumulator-combination logic, but currently has an uninitialised s4 used as a shift amount in the left-shift path. |
| lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S | Refactors stack layout and adds extra vector scratch regions for VX4B, but currently computes a stack-based pointer (s7) that underflows the frame and is then used by VPU load/store instructions. |
| lib_xcore_math/src/arch/vx4b/filter/filter_biquad_sat_s32.S | Tightens the stack frame and vector scratch allocation while preserving the biquad-saturated processing pipeline and state-update logic. |
| lib_xcore_math/src/arch/vx4b/filter/filter_biquad_s32.S | Minimises stack usage and simplifies the non-saturating biquad filter implementation, updating VPU constant loads and state propagation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot please review |
|
@andrewstanfordjason I've opened a new pull request, #210, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR cleans up the lib_xcore_math filter code for vx4b and reduces the stack usage to its minimum. It also fixes a bug in the filter tests.