Skip to content

Conversation

@colinmcardell
Copy link
Contributor

@colinmcardell colinmcardell commented Nov 18, 2025

what

adds c/c++ testing infrastructure with doctest and trompeloeil frameworks, automatic test discovery in tests/wscript, c/c++ test workflow to github actions, documentation and contribution notes, and the beginnings of test coverage across matron and crone components.

the unit tests included in this pr showcase multiple approaches of test coverage (mocks, stubs, test seams, etc.) and exercising code to ensure the expected behavior. these initial tests establish testing patterns and providing a nice reference along with the documentation.

also included in this pr are various important fixes for clock time tracking (see details below), as well as minor threading, and args parsing issues discovered during test development.

fixes #1863

how

see included tests/README.md for detailed documentation on how the testing infrastructure works.

testing infrastructure

  • add doctest (assertions/runner) and trompeloeil (c++ mocking) as submodules
  • implement automatic test discovery in tests/wscript that mirrors test paths to source tree
  • add test.sh wrapper script to build and run tests
  • integrate c/c++ tests into github actions workflow
  • add tests path to clang_format.sh

included coverage

  • crone: window lookup table, peakmeter rise/decay, vu meter taper, jack client time tracking, utilities
  • matron: args parsing, clock implementations (internal/midi/crow/link/scheduler)
  • often using NORNS_TEST preprocessor seams for exposing private interfaces during tests without changes to the core code

documentation

  • add tests/README.md with detailed examples and patterns
  • update CONTRIBUTING.md with testing guidelines

improvements discovered during initial test coverage implementation

clock internal

issue
clock loop enters catchup spiral when experiencing lag (potentially scheduler delays, JACK stalls, system suspend, NTP adjustments), causing timing jitter for minor lag, burst publishing for moderate lag, or complete CPU lockup for severe lag. when time jumps forward, next_tick_time - current_time becomes large and negative, producing invalid timespec values, stalled thread sleeps, and frozen tempo updates.

mentioned here #1865.

fix
clamp negative sleep values to zero before timespec conversion. rebase next_tick_time forward when schedule falls more than one tick behind.

test
stub clock_nanosleep for virtual time control, jump forward by hours, verify valid sleep calculations and continued tempo publishing.

jack_client

issue
JACK's 32-bit frame counter wraps after ~25 hours (48kHz), causing time to jump backward 25 hours and breaking all tempo/scheduling. out-of-order reads between calls can also cause time to decrease slightly.

a simple repro for this is to load up the firstlight script, let it run for ~25 hours, and watch the animation hang.

related to clock internal above and also mentioned here #1865.

fix
maintain 64-bit monotonic frame counter tracking high/low 32 bits separately. detect wraps when low bits cross from high to low range and increment high bits. enforce monotonicity with mutex-protected read-modify-write operations.

test
stub JACK frame values, simulate wraparound (0xFFFFFFF00x00000020), verify monotonic time across boundary and consistent multi-threaded reads.

additional improvements

  • args: switch to snprintf for safe truncation, wire up -f framebuffer flag which wasn't connected previously
  • event_types.h: decouple heavy includes, add opaque lo_message type

notes

  • mocking requires c++14 standard (updated in main wscript), this is a safe change.
  • tested changes on device, functional on aluminum norns cm3 and pi3 shield.
  • this is a pretty big pr! consider this as foundational work. more components can be tested using this infrastructure. i think the pr makes sense together, but if it's preferred as smaller changes we can always break it up a bit. i also keep combing through the details of the changes to ensure that it's sound... at this point i would like to get this large pr into review so that the work can be built on top of as a follow up.

@colinmcardell
Copy link
Contributor Author

looks like a GitHub action error when checking out the incoming changes. may need to manually retry.

@colinmcardell
Copy link
Contributor Author

hm. something is up with the addition of the test framework submodules within the GitHub test.yml. it doesn't seem to be able to update init the added submodules in the action.

i think i will remove the c/c++ test action from this pr, and add it as a follow up pr rebased on top of this work once it's reviewed and merged in. hopefully that will resolve it.

@colinmcardell
Copy link
Contributor Author

ha! ok well that was silly of me. forgot to actually commit the added submodules. works. 🎉

@colinmcardell colinmcardell force-pushed the feature/cpp-unit-testing branch from 670d824 to f36029c Compare November 20, 2025 02:48
@Dewb
Copy link
Member

Dewb commented Nov 20, 2025

Excellent! I’m not going to have time to give this a close review but the overall shape looks good and I think getting it in sooner is better.

If I were in your shoes, I would squash this into one commit for the test framework, and additional commits for each fix you made with the help of the tests; that will be easier for future viewers of the history to spot material changes to the clock system etc. But squashing into one commit or rebasing all the commits onto main are also defensible choices.

Never mind the above, this looks like the rebase and merge will leave a neat history, great stuff!

@tlubke
Copy link
Collaborator

tlubke commented Nov 24, 2025

What an incredible contribution!

I've reviewed this a bit and generally approve -- though echoing @Dewb 's original comment I would like to see @65c139a in particular split up into one fix internal clock sleep commit and one add comprehensive clock tests commit if it's not too much trouble to separate with the rebase in there.

There are some potential changes in behavior regarding the clock fixes, so I think we should take our time reviewing in that case. But overall the test additions look nice and are well documented.

@colinmcardell
Copy link
Contributor Author

Happy to take some time to split up the changes and the tests.

Reorganizing the commits might make it a bit easier to observe the failing tests and review the changes that allow them to pass.

I will try if possible to stack commits so that they are:

  1. non-production code changing additions and tests
  2. minor changes with tests
  3. clock - tests showcasing issues
  4. clock - patch producing passing tests
  5. jack client - tests showcasing issues
  6. jack client - patch producing passing tests

This should sort the work nicely.

… mapping

- implements automatic test discovery in tests/wscript.
- discovers test_*.cpp files and builds one binary per directory.
- mirrors test paths to system sources (tests/matron/test_foo.cpp → matron/src/foo.c).
- handles snake_case and CamelCase variants.
- includes test runner self-tests for validation of test runner integrity.

updates root wscript to register test command and enable waf_unit_test feature. sets C++14 standard which is required by trompeloeil and safe to change to.
provides common test entry point for all test binaries.
formats test files (.c, .h, .cpp, .hpp) alongside system sources.
validates raised cosine window: array size, value range [0,1], and endpoints.

adds out-of-class definition for raisedCosShortLen to crone/src/Window.cpp required by tests.
- rise/decay/negative/zero behaviors, getPos range and monotonicity
- includes taper_vu_table_stub test helper which provides simple linear lookup table
validates IEC-60268 perceptual taper: output range [0,1], saturation above unity, monotonic non-decreasing behavior across amplitude range, and precise anchor points at 0.1/0.5/0.9 against actual lookup table interpolation.
@colinmcardell colinmcardell force-pushed the feature/cpp-unit-testing branch 2 times, most recently from d4b5482 to 31ceb09 Compare November 30, 2025 04:59
@artfwo
Copy link
Member

artfwo commented Nov 30, 2025

not gonna block this, but these changes make the clock code difficult to read (and modify). i wonder if that's possible to fix the original integer overflow problem in jack instead, for instance?

as to testability modifications, they reduce readability too, although to a lesser extent. i would like to raise a concern on what's the value of these tests being executed on every change (after the code has already been tested)?

@colinmcardell colinmcardell force-pushed the feature/cpp-unit-testing branch 2 times, most recently from 774dad7 to e949621 Compare November 30, 2025 13:28
@colinmcardell
Copy link
Contributor Author

colinmcardell commented Nov 30, 2025

Hi @tlubke, thanks for the review and feedback.

What an incredible contribution!

Thanks! I'm really excited for this addition.

split up into one fix internal clock sleep commit and one add comprehensive clock tests commit if it's not too much trouble to separate with the rebase in there.

I've rearranged the commits and pushed.

There are some potential changes in behavior regarding the clock fixes, so I think we should take our time reviewing in that case.

Agreed. I am happy to report that I have been able to run scripts like firstlight, and awake for over a week without locking up. I haven't run into a script that is failing, where it was previously functional. I will continue testing.

@colinmcardell
Copy link
Contributor Author

Hi @artfwo, thanks for having a look and for the feedback! I spent some time reworking the changes to clock_internal.c to address the readability concerns. I tried to make the inline comments more concise and clear and moved the bulk of the test seams to the bottom of the file, which helps a lot.

The jack client integer overflow is addressed with e949621.

Regarding your concern about the value of these tests: I feel strongly that they improve the codebase. I've made sure to include caching in the GitHub action to attempt to reduce the latency of building. The tests themselves are extremely fast. They deterministically exercise and showcase aspects of the code that previously required manual testing. They run on each future change and contribution to ensure there are no regressions to expected behavior.

Introducing seams into existing code does make it a bit more challenging to read, but seams are light touch and allow control and testability without requiring major refactor/rewrite.

Happy to discuss more. Again, thanks for having a look.

@artfwo
Copy link
Member

artfwo commented Nov 30, 2025

The jack client integer overflow is addressed with e949621.

right, but what i meant was - if bit depth of jack_nframes_t is fixed in jack itself, would that make the whole dancing with wrapping numbers redundant?

@colinmcardell
Copy link
Contributor Author

Thanks for the added clarity here! I will definitely have another look when I'm back at my desk. In the meantime if you have a moment, I would be curious to know if you have more thoughts on what the solution might look like here practically based on your note. Pseudo code would be fine.

More soon.

@artfwo
Copy link
Member

artfwo commented Nov 30, 2025

ave another look when I'm back at my desk. In the meantime if you have a moment, I would be curious to know if you have more thoughts on what the solution might look like here practically based on your note. Pseudo code would be fine.

well, my idea is that there won't be a need for any (pseudo) code, if jack returns 64-bit frame counter.

@artfwo
Copy link
Member

artfwo commented Nov 30, 2025

another option could be using jack function jack_get_time in jack_client_get_current_time instead of jack_frame_time that seems to return 64-bit microseconds, have you tried using that?

@colinmcardell
Copy link
Contributor Author

@artfwo Thanks again for your response.

if bit depth of jack_nframes_t is fixed in jack itself, would that make the whole dancing with wrapping numbers redundant?
well, my idea is that there won't be a need for any (pseudo) code, if jack returns 64-bit frame counter.

If jack were to receive an update to its API to operate in 64-bit frames, the wrap logic would basically become a no-op where g_last_total_frames would track a 64-bit value directly, and continue to function. Changing the jack audio server api would be non-trivial and I would suggest it be considered out of scope for this.

another option could be using jack function jack_get_time in jack_client_get_current_time instead of jack_frame_time that seems to return 64-bit microseconds, have you tried using that?

I hadn't tried this. The documentation for jack_get_time indicates that it's not linear.

Digging a little bit into the implementation details of jack_frame_time here, it looks like jack_frame_time is doing work to ensure it's linear, similar to jack_last_frame_time.

The 32-bit frames that jack produces are not a bug, and it turns out the recommended solution is in the JACK audio documentation:

Because of the types used, all the returned values except period_usecs are unsigned. In computations mapping between frames and microseconds signed differences are required. The easiest way is to compute those separately and assign them to the appropriate signed variables, int32_t for frames and int64_t for usecs. See the implementation of jack_frames_to_time() and jack_time_to_frames() for an example.

This function is where jack_frames_to_time is applying the signed delta/difference approach to converting frames to time.

After marinating with the code for a while and reviewing these docs, I am convinced that the changes to the norns jack_client can be simplified (I kind of got laser focused on this high/low bit masking approach). I think the better approach would be a signed delta time accumulator. I will take another pass at it, simplifying the implementation and reducing the cognitive load a bit.

@artfwo
Copy link
Member

artfwo commented Dec 1, 2025

This function is where jack_frames_to_time is applying the signed delta/difference approach to converting frames to time.

i haven't investigated the code thoroughly, but doesn't that mean that it already does all the work necessary to get the monotonic driver time in seconds (basically what we need for the clock)?

is non-linearity actually something we should worry about, given that softcut and supercollider may have non-linear clocks too?

@colinmcardell
Copy link
Contributor Author

i haven't investigated the code thoroughly, but doesn't that mean that it already does all the work necessary to get the monotonic driver time in seconds (basically what we need for the clock)?

No. I did dig into this a bit and using the jack_frames_to_time function would convert the audio frame count value back into wall clock time. This would essentially be the same as calling jack_get_time/GetMicroSeconds() with extra steps and potential rounding errors.

We need to do the work of maintaining the last frame and stay in frames as a unit for the wrap logic. This keeps the norns internal clock locked to the audio clock. I wouldn't recommend changing this as it would introduce drift/jitter/etc., relative to the audio output.

is non-linearity actually something we should worry about, given that softcut and supercollider may have non-linear clocks too?

Softcut and Supercollider operate linearly and are locked to the hardware sample clock. Using jack_frame_time keeps the norns internal clock locked to the same hardware reference.

@artfwo
Copy link
Member

artfwo commented Dec 1, 2025

oh well, that's a bummer. i've raised a jack issue about the size of jack_nframes_t, and would suggest planning any workarounds depending on whether or not it's fixable in jack, but again, don't wanna block whatever you're doing and gonna leave any decisions at the discretion of current maintainers.

@colinmcardell colinmcardell force-pushed the feature/cpp-unit-testing branch 2 times, most recently from cf37aca to 40384e2 Compare December 1, 2025 22:56
@colinmcardell
Copy link
Contributor Author

I've pushed the update to the jack_client, as mentioned. I've replaced the previous explicit bit-masking and "half-range" carry logic with the signed delta accumulator approach (as recommended in the jack docs).

I updated the test scaffolding slightly to properly seed the internal state for the wrap scenarios, and everything continues to pass. 🎉

@colinmcardell
Copy link
Contributor Author

Thanks @artfwo for the discussion and for taking a look. It was good for me to comb through the details on the jack_client change again and dig deeper into the jack docs.

@colinmcardell colinmcardell force-pushed the feature/cpp-unit-testing branch from 40384e2 to 05f67e8 Compare December 3, 2025 02:11
@tehn tehn merged commit 1ed88b5 into monome:main Jan 2, 2026
5 checks passed
@tehn
Copy link
Member

tehn commented Jan 3, 2026

@colinmcardell github is complaining that the cpp test isn't passing, any ideas?

https://github.com/monome/norns/actions/runs/20679127201/job/59370833961

@colinmcardell
Copy link
Contributor Author

@colinmcardell github is complaining that the cpp test isn't passing, any ideas?

https://github.com/monome/norns/actions/runs/20679127201/job/59370833961

I will have a look next time I'm at my desk. Thanks.

@colinmcardell
Copy link
Contributor Author

I've had a look, the test was passing fine locally, but it seems to be flaky on the more constrained Github CI. I've patched up a flaky test. I will open a PR with the fix soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C/C++ unit testing infrastructure

5 participants