Skip to content

Conversation

@andrewstanfordjason
Copy link
Contributor

This PR cleans up the lib_xcore_math filter code for vx4b and reduces the stack usage to its minimum. It also fixes a bug in the filter tests.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cleans up and optimises the VX4B filter assembly implementations in lib_xcore_math, primarily by reducing stack usage, and adjusts tests to match the VX4B numerical behaviour and fix a shift-handling bug in the FIR S32 tests.

Changes:

  • Reduce NSTACKWORDS and adjust stack layouts for VX4B filter assembly routines (filter_fir_s32, filter_fir_s16, biquad variants, and support routines) to minimise stack usage and remove old tool-annotation clutter.
  • Fix the S32 FIR test’s expected-value computation for shift == 0 and relax several S16 FIR tests on __VX4B__ to allow small LSB differences.
  • Clean up VX4B VPU usage in various filter assembly files (use of la for constants, updated mask and pointer handling, consistent .globl/metadata directives).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/filter_tests/src/filter/test_filter_fir_s32.c Adjusts expected-value logic to treat shift == 0 as a non-shift (avoiding undefined (1LL << -1)), fixing the S32 FIR test bug.
tests/filter_tests/src/filter/test_filter_fir_s16.c For __VX4B__, replaces exact equality checks with small INT16_WITHIN tolerances to account for VX4B rounding/saturation differences.
lib_xcore_math/src/arch/vx4b/filter/vect_s32_convolve_valid.S Reduces stack usage and removes translation warnings while keeping the convolution loop and tail handling logic intact.
lib_xcore_math/src/arch/vx4b/filter/push_sample_up_s16.S Shrinks the stack frame, fixes prologue/epilogue register save/restore offsets, and updates comments and branch usage for VX4B.
lib_xcore_math/src/arch/vx4b/filter/push_sample_down_s16.S Similar stack-frame reduction and cleanup for the “push sample down” helper, with corrected save/restore and VPU loop structure.
lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S Reduces stack usage and cleans up the VPU setup, tail processing, and accumulator-combination logic, but currently has an uninitialised s4 used as a shift amount in the left-shift path.
lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S Refactors stack layout and adds extra vector scratch regions for VX4B, but currently computes a stack-based pointer (s7) that underflows the frame and is then used by VPU load/store instructions.
lib_xcore_math/src/arch/vx4b/filter/filter_biquad_sat_s32.S Tightens the stack frame and vector scratch allocation while preserving the biquad-saturated processing pipeline and state-update logic.
lib_xcore_math/src/arch/vx4b/filter/filter_biquad_s32.S Minimises stack usage and simplifies the non-saturating biquad filter implementation, updating VPU constant loads and state propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@andrewstanfordjason
Copy link
Contributor Author

@copilot please review

Copy link
Contributor

Copilot AI commented Jan 30, 2026

@andrewstanfordjason I've opened a new pull request, #210, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants