Skip to content

Experimental cppyy#1769

Draft
Legend101Zz wants to merge 30 commits intobrian-team:masterfrom
Legend101Zz:experimental-cppyy
Draft

Experimental cppyy#1769
Legend101Zz wants to merge 30 commits intobrian-team:masterfrom
Legend101Zz:experimental-cppyy

Conversation

@Legend101Zz
Copy link
Copy Markdown
Contributor

Problem

The current Brian2 code generation pipeline suffers from a fundamental performance bottleneck. The issue is not tied to the specific tools we use, but rather to the Ahead-of-Time (AOT) compilation paradigm itself.

Regardless of whether we use Cython (our current approach) or manual C-extensions, the workflow remains slow and cumbersome:

  • Generate large C++ source files on disk
  • Invoke an external compiler (e.g., g++, clang) with significant overhead
  • Wait for compilation to complete (often 15–40 seconds, which disrupts interactivity)
  • Dynamically load the compiled result through a complex process

In other words, the bottleneck lies in the file-based, external-compiler, AOT workflow.


Proposed Solution: JIT Compilation with cppyy

This PR introduces cppyy as a new runtime code generation target, shifting from AOT to Just-in-Time (JIT) compilation.

With cppyy, C++ code is compiled in-memory using the Cling C++ interpreter, which eliminates:

  • File I/O overhead
  • External compiler process spawning
  • Long compilation waiting times
  • Complex dynamic loading procedures

Current Status

  • End-to-end JIT compilation pipeline
  • Basic neuron group simulations
  • State updates, thresholds, and resets
  • Template system for different operations
  • Integration with the device layer

Next Steps

Fix for dynamic arrays and spikequeue and synapses

@Legend101Zz
Copy link
Copy Markdown
Contributor Author

@mstimberg the initial PR I'll keep working on this branch itself and finally make the synapses too working on this , cheers :)

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Legend101Zz
Copy link
Copy Markdown
Contributor Author

@mstimberg I added a introspection to do the stuff we were discussing yesterday , like to view the cppy code and codeobjects and even change it over the fly , I have added an attached jupyter sample for reference which you can use and play around with :)

Attaching a few screenshots of how it looks :

Screenshot 2026-02-14 at 16 01 02 Screenshot 2026-02-14 at 16 01 23

self.namespace[name] = value

# ── Dynamic arrays: store BOTH the data view AND the capsule ──
# The data view (_ptr_array_*) gives C++ direct pointer access
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I realised something while coding this , the way we coded dynamic arrays in runtime to be available for runtime mode , we'll have to change that for cppyy as RuntimeDevice currently creates its dynamic arrays through Cython wrappers. Those Cython wrappers own the underlying DynamicArray1D<T>*. If we want to go fully cppyy-native or have multiple backends only , we'll need to change how the RuntimeDevice stores dynamic arrays in cppyy codegen backend — either replacing the Cython wrappers with cppyy-managed objects, as that is what would be best ...

@Legend101Zz
Copy link
Copy Markdown
Contributor Author

here is a snippet I used to test things :

import time

import numpy as np

from brian2 import *

prefs.codegen.target = 'cppyy'
prefs.codegen.runtime.cppyy.enable_introspection = True # Enable introspection

# prefs.codegen.target = 'cython'
# prefs.codegen.runtime.cython.cache_dir = 'cythontmp/'
# prefs.codegen.runtime.cython.delete_source_files = False



# Hodgkin-Huxley neuron model

num_neurons = 100
duration = 500*ms

# Parameters
area = 20000*umetre**2
Cm = 1*ufarad*cm**-2 * area
gl = 5e-5*siemens*cm**-2 * area
El = -65*mV
EK = -90*mV
ENa = 50*mV
g_na = 100*msiemens*cm**-2 * area
g_kd = 30*msiemens*cm**-2 * area
VT = -63*mV

eqs = Equations('''
dv/dt = (gl*(El-v) - g_na*(m*m*m)*h*(v-ENa) - g_kd*(n*n*n*n)*(v-EK) + I)/Cm : volt
dm/dt = 0.32*(mV**-1)*4*mV/exprel((13.*mV-v+VT)/(4*mV))/ms*(1-m)-0.28*(mV**-1)*5*mV/exprel((v-VT-40.*mV)/(5*mV))/ms*m : 1
dn/dt = 0.032*(mV**-1)*5*mV/exprel((15.*mV-v+VT)/(5*mV))/ms*(1.-n)-.5*exp((10.*mV-v+VT)/(40.*mV))/ms*n : 1
dh/dt = 0.128*exp((17.*mV-v+VT)/(18.*mV))/ms*(1.-h)-4./(1+exp((40.*mV-v+VT)/(5.*mV)))/ms*h : 1
I : amp
''')

group = NeuronGroup(num_neurons, eqs,
                    threshold='v > -40*mV',
                    refractory='v > -40*mV',
                    method='exponential_euler')
group.v = El
group.I = '0.7*nA * i / num_neurons'


# SpikeMonitor: records spike times and indices (dynamic arrays)
spike_mon = SpikeMonitor(group)

# StateMonitor: records v for a few neurons every timestep (2D dynamic array)
state_mon = StateMonitor(group, 'v', record=[0, 25, 50, 75, 99])


print(f"Running {num_neurons} HH neurons for {duration}...")
t_start = time.perf_counter()
run(duration)
t_elapsed = time.perf_counter() - t_start
print(f"Done in {t_elapsed:.2f}s")

print(f"\nTotal spikes: {spike_mon.num_spikes}")
print(f"StateMonitor recorded {state_mon.t.shape[0]} timesteps "
      f"for {len(state_mon.record)} neurons")

# --- Now use the introspector ---
from brian2.codegen.runtime.cppyy_rt.introspector import get_introspector

intro = get_introspector()

# ---- 1. List all compiled code objects ----
print("=" * 60)
print("LIST OBJECTS")
print("=" * 60)
print(intro.list_objects())

# ---- 2. Inspect the state updater ----
print("\n" + "=" * 60)
print("INSPECT STATE UPDATER")
print("=" * 60)
# Using glob pattern — "stateupdater*" matches the full name
print(intro.inspect("*stateupdater*"))

# ---- 3. View just the params ----
print("\n" + "=" * 60)
print("PARAMS")
print("=" * 60)
print(intro.params("*stateupdater*"))

# ---- 4. View the namespace ----
print("\n" + "=" * 60)
print("NAMESPACE")
print("=" * 60)
print(intro.namespace("*stateupdater*"))

# ---- 5. View C++ globals ----
print("\n" + "=" * 60)
print("C++ GLOBALS")
print("=" * 60)
print(intro.cpp_globals())

# ---- 6. Evaluate a C++ expression ----
print("\n" + "=" * 60)
print("EVAL C++")
print("=" * 60)
print(f"M_PI = {intro.eval_cpp('M_PI')}")
print(f"sizeof(double) = {intro.eval_cpp('sizeof(double)', 'size_t')}")
print(f"_brian_mod(7, 3) = {intro.eval_cpp('_brian_mod(7, 3)', 'int32_t')}")

- Rewrite ratemonitor.cpp to use capsule-based resize pattern
  (was using nonexistent .push_back() on DynamicArray)
- Add _brian_cppyy_seed/_brian_cppyy_seed_random to support code
  and wire into RuntimeDevice.seed() for reproducible simulations
- Add parameter count logging in run_block() for debugging
- Add subgroup filtering to ratemonitor (matching Cython behavior)
Add CppyyDynamicArray1D/2D as drop-in replacements for Cython wrappers.
dynamicarray.py now tries Cython first, falls back to cppyy if Cython
extensions aren't compiled. Same API: .data, .resize(), .get_capsule().
PyCapsule names are identical so templates work with either backend.
- cppyy-backed SpikeQueue as drop-in Cython replacement
- Synapse templates: synapses, push_spikes, create_array, create_generator
- Capsule-based parameter passing for queue and dynamic arrays
- Python-side synapse bookkeeping after cppyy code object runs
- Fallback chain in spikequeue.py: Cython → cppyy
…xtraction, consolidated helpers

- synapses_create_generator: 1024-element buffer for pre/post arrays (O(n/1024) resizes vs O(n))
- spikemonitor: extract capsules once before spike loop, cache data pointers
- statemonitor: extract 2D capsules once before per-neuron loop
- ratemonitor: use get_array_name() instead of hardcoded _dynamic_array_ prefix
- synapses/synapses_push_spikes: move _extract_spike_queue to global support code in cppyy_rt.py
- test-cppyy-audit.py: 16-test subprocess-isolated suite (all passing)
Rewrites docs/cppyy-backend.md with full architecture visualization:
- End-to-end flow, three naming worlds, parameter sync invariant
- Template architecture, zero-copy data bridge, synapse lifecycle
- DynamicArray/SpikeQueue backends, monitor data flow
- Guard code, global support code, compilation lifecycle
- Updated limitations and next steps
…er protocol

- Port spikegenerator.cpp and spatialstateupdate.cpp from Cython templates
- Use bare N (not {{ N }}) for Constant variables in templates
- Fix cppyy int64_t buffer protocol on LP64 platforms: map int64_t→long
  in _cppyy_c_data_type() since cppyy rejects int64_t* (long long*)
  but accepts long* for numpy int64 arrays
- Add SpikeGeneratorGroup tests (basic + periodic) to test suite
- All 18 tests pass
- Remove cppyy_dynamicarray.py and cppyy_spikequeue.py: DynamicArray and
  SpikeQueue are compiled from Cython at install time, no runtime fallback needed.
  Revert dynamicarray.py and spikequeue.py to Cython-only with hard ImportError.

- Fix 12 standalone test failures (NotImplementedError before run()): replaced
  self.variables["_source_offset"].get_value() with int(getattr(self.source, "start", 0))
  in both _add_synapses_from_arrays and _add_synapses_generator. CPPStandaloneDevice
  rejects get_value() before run(); the offset values are Python-time constants.
  getattr(..., 0) also handles Synapses-as-source (no .start attribute).

- Fix test_synapses_state_monitor (Python-side size desync): the new Cython synapse
  creation templates update C++ m_size directly but Python-side .size was only synced
  for cppyy code objects. Call _resize() unconditionally for all backends. Keep
  _update_synapse_numbers() cppyy-only — Cython templates already update N_outgoing/
  N_incoming in C++; calling it again doubles the counts.

- Fix SyntaxWarning in introspector.py: invalid escape sequence \d -> \\d in docstring.
_resize() and get_value() on _synaptic_pre cannot be called during
connect() under CPPStandaloneDevice — the C++ code is only scheduled,
not executed, so synapse counts are not yet known. Guard both blocks
in _add_synapses_from_arrays and _add_synapses_generator with
isinstance(get_device(), RuntimeDevice) so standalone tests pass
while the Cython/cppyy runtime fixes from the previous commit are
preserved.
…one failures

len(self) calls get_value() which raises NotImplementedError on CPPStandaloneDevice
before run(). Move old_num_synapses capture inside the RuntimeDevice guard so it is
only evaluated on runtime (numpy/cython/cppyy) devices.
- Add group_get_indices.cpp template: loops over N neurons, evaluates the
  condition expression, and collects matching indices into a pre-allocated
  output buffer (_return_values_buf) with a count in _return_values_n.

- CppyyCodeGenerator.determine_keywords(): detect group_get_indices by
  checking that both _cond and _indices are AuxiliaryVariables (unique to
  the IndexWrapper.__getitem__ path), then append the two output-buffer
  params to function_params so the C++ signature includes them.

- CppyyCodeObject.variables_to_namespace(): inject _return_values_buf and
  _return_values_n numpy arrays when template_name == 'group_get_indices'.

- CppyyCodeObject._build_param_mapping(): mirror the two extra entries so
  the Python call-site args match the C++ signature.

- CppyyCodeObject.run_block(): after compiled_func(*args), if this is a
  group_get_indices codeobj return the sliced result array.

- conftest.py: add cppyy implementation of fake_randn so tests using the
  fake_randn_randn_fixture work under the cppyy target.

- tests/__init__.py: auto-detect cppyy alongside numpy/cython so calling
  brian2.test() without explicit targets also runs the cppyy suite.

- run_test_suite.py: detect cppyy availability and add it to in_parallel
  so CI standalone:false jobs also exercise the cppyy target.
initialise_queue() calls get_value() on eventspace, _delays and
synapse_sources, which raises NotImplementedError under CPPStandaloneDevice
before run(). The before_run() override that calls it was added for cppyy
(C++ before_code blocks can't invoke Python), but the guard was missing.

Under standalone mode the queue is set up in the generated C++ code, so
Python must not try to initialise it during before_run().
- Add cppyy>=3.1 as optional dependency (pip install .[cppyy])
- Install cppyy on all non-standalone runners (Linux, macOS, Windows)
- Add ilammy/msvc-dev-cmd step on Windows so Cling can find cl.exe at JIT time
- Add DYLD_LIBRARY_PATH for macOS runners to resolve cppyy's hardcoded
  MacPorts zstd path against Homebrew locations (arm64 + Intel)
- Soft-fail the install step so CI is not broken if cppyy is unavailable
CPyCppyy has no pre-built wheel for Python 3.14+ on Windows. Building
from source fails: the pre-built cppyy_backend-1.15.3 .lib is missing
Cppyy::GetNumBasesLongestBranch which CPyCppyy 1.13.0 requires at link
time. Re-enable once cppyy publishes compatible Windows wheels.
…names

When Brian2 GC's a TimedArray (e.g. at test teardown), its Python name
becomes available for reuse. A subsequent test can create a new TimedArray
with the same name but different K/N parameters, generating a different
C++ function body under the same symbol (e.g. `_timedarray`). The previous
#ifndef guard was keyed on the body content-hash, so two bodies with the
same symbol but different hashes would both try to define the same C++
symbol in Cling — causing a "redefinition" error.

Fix strategy:
- cppyy_generator: wrap each user-function support code piece in a guard
  keyed by the C++ *symbol name* (not body hash) so Cling only compiles
  the first occurrence of any given name. Fix _extract_primary_cpp_symbol
  to only inspect the first declaration line (not function body lines).
- cppyy_rt: add _rename_conflicting_user_functions() that detects when a
  function name is reused with a different body (different content hash)
  and renames both the function and its _namespace_*_values global in the
  code string. This prevents both the Cling redefinition error and the
  cppyy "buffer too large for value" error from reassigning a double*
  global to an array of a different size.
…apses_create_generator

When result_index_condition=True and if_expression is set (e.g. S.connect("i==j")),
both create_cond and update sections independently declare `const int32_t _post_idx =
_raw_post_idx;` in the same C++ scope. Cling rejects the second declaration as a
redefinition.

Fix: wrap the create_cond code section in a braced scope `{}` with the condition
result captured to `_create_cond_result`. The update section then declares _post_idx
first in the outer scope, which is also available for the buffer-filling loop.

This fixes ~14 test_subgroup.py and test_synapses.py failures (test_synaptic_propagation,
test_synapse_creation_generator_*, test_spike_monitor, test_no_reference_*, etc.).
The cppyy group_variable_set.cpp and group_variable_set_conditional.cpp
templates were missing the {# ALLOWS_SCALAR_WRITE #} directive that Cython
equivalents have. Without it, the code generator raises "Writing to scalar
variable X not allowed in this context" when setting shared variables like
G.E_L = "expression", S.delay = 1*ms, etc.

Fixes test_scalar_variable, test_delay_specification, test_delays_pathways,
test_scalar_parameter_access, and related tests.
…ator; use mutable _uiter_size for fixed-size sample
… timedarray/binomial, fix introspector SyntaxWarning
… GSL skipping

Three bugs caused CI failures for the cppyy runtime target:

1. `static std::mt19937 _brian_cppyy_rng` had internal linkage, so each
   new Cling translation unit (compiled per network.run() call) got a fresh
   default-seeded copy — all runs produced identical random values.
   Fix: remove `static` to give external linkage; one shared instance across
   all TUs. Also move `_dist_rand` to file scope (no static).

2. `seed()` checked `hasattr(cppyy.gbl, "_brian_cppyy_seed")` before the
   support code was compiled, so pre-run seed() calls were silent no-ops.
   Fix: call `_ensure_support_code()` eagerly inside `seed()`.

3. `get/set_random_state()` ignored C++ RNG state entirely, so
   `restore(restore_random_state=True)` could not reproduce identical runs.
   Fix: expose `_brian_cppyy_get/set_rng_state()` C++ functions (using
   std::ostringstream/istringstream) and integrate into get/set_random_state().

Additionally, `std::normal_distribution` has an internal cache that cannot
be serialized. Replace with a custom Marsaglia polar method using explicit
`_brian_randn_has_spare` / `_brian_randn_spare` file-scope variables that
round-trip cleanly through the state string.

GSL tests were also failing because `skip_if_not_implemented` only skipped
for the numpy target, not cppyy. Fix: check `effective in ("numpy", "cppyy")`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant