asm clean up and stack size reduction #209

andrewstanfordjason · 2026-01-30T14:03:14Z

This PR cleans up the lib_xcore_math filter code for vx4b and reduces the stack usage to its minimum. It also fixes a bug in the filter tests.

Copilot

Pull request overview

This PR cleans up and optimises the VX4B filter assembly implementations in lib_xcore_math, primarily by reducing stack usage, and adjusts tests to match the VX4B numerical behaviour and fix a shift-handling bug in the FIR S32 tests.

Changes:

Reduce NSTACKWORDS and adjust stack layouts for VX4B filter assembly routines (filter_fir_s32, filter_fir_s16, biquad variants, and support routines) to minimise stack usage and remove old tool-annotation clutter.
Fix the S32 FIR test’s expected-value computation for shift == 0 and relax several S16 FIR tests on __VX4B__ to allow small LSB differences.
Clean up VX4B VPU usage in various filter assembly files (use of la for constants, updated mask and pointer handling, consistent .globl/metadata directives).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/filter_tests/src/filter/test_filter_fir_s32.c	Adjusts expected-value logic to treat `shift == 0` as a non-shift (avoiding undefined `(1LL << -1)`), fixing the S32 FIR test bug.
tests/filter_tests/src/filter/test_filter_fir_s16.c	For `__VX4B__`, replaces exact equality checks with small `INT16_WITHIN` tolerances to account for VX4B rounding/saturation differences.
lib_xcore_math/src/arch/vx4b/filter/vect_s32_convolve_valid.S	Reduces stack usage and removes translation warnings while keeping the convolution loop and tail handling logic intact.
lib_xcore_math/src/arch/vx4b/filter/push_sample_up_s16.S	Shrinks the stack frame, fixes prologue/epilogue register save/restore offsets, and updates comments and branch usage for VX4B.
lib_xcore_math/src/arch/vx4b/filter/push_sample_down_s16.S	Similar stack-frame reduction and cleanup for the “push sample down” helper, with corrected save/restore and VPU loop structure.
lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S	Reduces stack usage and cleans up the VPU setup, tail processing, and accumulator-combination logic, but currently has an uninitialised `s4` used as a shift amount in the left-shift path.
lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S	Refactors stack layout and adds extra vector scratch regions for VX4B, but currently computes a stack-based pointer (`s7`) that underflows the frame and is then used by VPU load/store instructions.
lib_xcore_math/src/arch/vx4b/filter/filter_biquad_sat_s32.S	Tightens the stack frame and vector scratch allocation while preserving the biquad-saturated processing pipeline and state-update logic.
lib_xcore_math/src/arch/vx4b/filter/filter_biquad_s32.S	Minimises stack usage and simplifies the non-saturating biquad filter implementation, updating VPU constant loads and state propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S

andrewstanfordjason · 2026-01-30T15:58:29Z

@copilot please review

Copilot · 2026-01-30T15:58:38Z

@andrewstanfordjason I've opened a new pull request, #210, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S

asm clean up and stack size reduction

a6107f9

andrewstanfordjason requested a review from Copilot January 30, 2026 14:03

Copilot started reviewing on behalf of andrewstanfordjason January 30, 2026 14:03 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S Outdated Show resolved Hide resolved

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S Show resolved Hide resolved

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s32.S Outdated Show resolved Hide resolved

vx4a filtewr asm now uses only ABI reg names

ca30542

Copilot AI mentioned this pull request Jan 30, 2026

Update stale register name comments in filter_fir_s32.S #210

Closed

andrewstanfordjason requested a review from Copilot January 30, 2026 16:06

Copilot started reviewing on behalf of andrewstanfordjason January 30, 2026 16:06 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

lib_xcore_math/src/arch/vx4b/filter/filter_fir_s16.S Show resolved Hide resolved

Made filter_fir_s16.S stack safein all conditions

577d84c

andrewstanfordjason requested a review from uvvpavel January 30, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asm clean up and stack size reduction #209

asm clean up and stack size reduction #209

Uh oh!

andrewstanfordjason commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewstanfordjason commented Jan 30, 2026

Uh oh!

Copilot AI commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asm clean up and stack size reduction #209

Are you sure you want to change the base?

asm clean up and stack size reduction #209

Uh oh!

Conversation

andrewstanfordjason commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewstanfordjason commented Jan 30, 2026

Uh oh!

Copilot AI commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants