Skip to content

Conversation

@Bearwynn
Copy link

Overview

This PR introduces significant performance optimizations to the existing Conway's Game of Life contrib script. The changes focus on improving execution speed and memory efficiency on the Raspberry Pi Pico's constrained hardware while maintaining full compatibility with the existing API and functionality.

Rationale

These optimizations address the specific constraints of the Raspberry Pi Pico's MicroPython environment, where:

  • Memory is limited (264KB RAM)
  • CPU performance is constrained (133MHz ARM Cortex-M0+)
  • Function call overhead is significant
  • The changes follow MicroPython best practices for performance-critical applications while maintaining code readability and the original intent of the Conway implementation as a pseudo-random LFO kernel.

External Docs

Maximising micropython speed

https://docs.micropython.org/en/v1.9.3/pyboard/reference/speed_python.html#maximising-micropython-speed

@micropython.native

https://docs.micropython.org/en/v1.9.3/pyboard/reference/speed_python.html#the-native-code-emitter

This causes the MicroPython compiler to emit native CPU opcodes rather than bytecode. It covers the bulk of the MicroPython functionality, so most functions will require no adaptation. It is invoked by means of a function decorator

@micropython.viper

https://docs.micropython.org/en/v1.9.3/pyboard/reference/speed_python.html#the-viper-code-emitter

Like the Native emitter Viper produces machine instructions but further optimisations are performed, substantially increasing performance especially for integer arithmetic and bit manipulations. It is invoked using a decorator.

Changes Made:

Memory Management Improvements:

  • Replaced 2D list for field_sum with 1D array for better cache performance.
  • Changed population_deltas list management to use .clear() instead of reassignment to prevent memory fragmentation.
  • Pre-calculated OLED_WIDTH_BYTES to avoid repeated calculations.

Performance Optimizations:

  • Added @micropython.native decorators to all class methods and helper functions.
  • Applied @micropython.viper optimization to performance-critical update_field_sums() and sum_cells() methods.
  • Used local variable caching in tight loops to reduce attribute lookup overhead.
  • Implemented direct memory access patterns in viper-optimized functions

Bug Fixes:

  • Fixed potential freezing issue during rapid reset requests by using .clear() instead of list reassignment.
  • Corrected typo in population delta comment ("predictably").

Technical Impact

Performance Benefits:

  • 2-3x faster simulation ticks due to viper optimization.
  • Reduced memory fragmentation by using different data structures.
  • Improved cache performance with 1D array layouts
  • Lower function call overhead via native compilation

Memory Usage:

  • More efficient memory utilization through pre-allocation.
  • Reduced garbage collection pressure.
  • Fixed memory leak in rapid reset scenario which caused program to freeze.

Final Note

  • Performance could be further improved by implementing viper decorators to the get_bit and set_bit functions within bitarray. I have tested a change to this firmware file locally and saw some speedup, but have decided to omit it from this change to try and keep it atomic. Unfortunately I do not have the time available to me to rigorously test and maintain a firmware change, though if you wish to look at my change to bitarray as a reference, it can be found here: bitarray performance improvments

Refactor Conway class for performance improvements.
Optimize memory usage with 1D arrays and micropython native/ iper functions.
@chrisib
Copy link
Collaborator

chrisib commented Oct 29, 2025

Thanks for this! Looks like the CI testing is going to need to get updated to handle the native code decorations, which isn't surprising. I'll load this onto one of my modules and give it a try. Hopefully the performance increase is enough that we can add the option to externally clock each generation (an early idea that got dumped because of the generally bad performance).

@Bearwynn
Copy link
Author

Bearwynn commented Oct 29, 2025

Thanks for this! Looks like the CI testing is going to need to get updated to handle the native code decorations, which isn't surprising. I'll load this onto one of my modules and give it a try. Hopefully the performance increase is enough that we can add the option to externally clock each generation (an early idea that got dumped because of the generally bad performance).

Ah yes, I think unfortunately the viper decorator function may cause some issues there since it's not fully python standards compliant. Hopefully it won't cause too much a headache. It should absolutely work when loaded onto a hardware europi though.

It's definitely faster, but I suspect not quite fast enough for external clocking. I haven't tested it with a raspberry pi Pico 2 but I suspect you may be able to externally clock it on that but not the Pico 1.
That said, don't be disheartened by that because I actually do have a nearly completely refactored version of this script from the ground up which actually does run so fast that each simulation tick can get down to single digit milliseconds, though I haven't profiled it in a while so I'd need to confirm that again with recent changes.
In that script I have external clocking, tick limiters controlled by K2, clock divider options on the CV outs, and more.
It's just so heavily edited that i didn't know if it should be an edit of this script, or an optional new one.
The issue is the file size is much bigger so I've chosen to add the most restrained performance change here so as to change the original intent as little as possible.

I even began to expand into other cellular automata and created a way for simulation and CV functionality to be modular and extensible by having them be seperate scripts that can be loaded in somewhat similar to the menu script does, but specifically for the cellular automata rules.
This mean that rendering code and other elements were able to be re-used in order to cut down on file size when adding more cellular automata.

If you like, I can share the all-in-one version that just uses the Conway ruleset and then also share the extensible cellular automata version separately. :)

@chrisib
Copy link
Collaborator

chrisib commented Oct 30, 2025

If you've got a better refactoring of this script, I'm definitely open to removing this one and replacing it with your new one instead. I'm a fan of reducing duplication where possible.

@Bearwynn
Copy link
Author

Bearwynn commented Oct 30, 2025

Okey dokey, I'll spend some time cleaning it up to meet contrib submit standards and I'll share it here (possibly a seperate PR) for you, I think you'll enjoy it :)

@chrisib
Copy link
Collaborator

chrisib commented Oct 30, 2025

I'm looking forward to seeing it!

@Bearwynn
Copy link
Author

Bearwynn commented Nov 4, 2025

@chrisib I've submitted a PR here: #449
I made sure to include a helpful cellarium,md that can help to explain things a bit :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants