Experimental cppyy#1769
Conversation
|
@mstimberg the initial PR I'll keep working on this branch itself and finally make the synapses too working on this , cheers :) |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@mstimberg I added a introspection to do the stuff we were discussing yesterday , like to view the cppy code and codeobjects and even change it over the fly , I have added an attached jupyter sample for reference which you can use and play around with :) Attaching a few screenshots of how it looks :
|
| self.namespace[name] = value | ||
|
|
||
| # ── Dynamic arrays: store BOTH the data view AND the capsule ── | ||
| # The data view (_ptr_array_*) gives C++ direct pointer access |
There was a problem hiding this comment.
So I realised something while coding this , the way we coded dynamic arrays in runtime to be available for runtime mode , we'll have to change that for cppyy as RuntimeDevice currently creates its dynamic arrays through Cython wrappers. Those Cython wrappers own the underlying DynamicArray1D<T>*. If we want to go fully cppyy-native or have multiple backends only , we'll need to change how the RuntimeDevice stores dynamic arrays in cppyy codegen backend — either replacing the Cython wrappers with cppyy-managed objects, as that is what would be best ...
|
here is a snippet I used to test things : import time
import numpy as np
from brian2 import *
prefs.codegen.target = 'cppyy'
prefs.codegen.runtime.cppyy.enable_introspection = True # Enable introspection
# prefs.codegen.target = 'cython'
# prefs.codegen.runtime.cython.cache_dir = 'cythontmp/'
# prefs.codegen.runtime.cython.delete_source_files = False
# Hodgkin-Huxley neuron model
num_neurons = 100
duration = 500*ms
# Parameters
area = 20000*umetre**2
Cm = 1*ufarad*cm**-2 * area
gl = 5e-5*siemens*cm**-2 * area
El = -65*mV
EK = -90*mV
ENa = 50*mV
g_na = 100*msiemens*cm**-2 * area
g_kd = 30*msiemens*cm**-2 * area
VT = -63*mV
eqs = Equations('''
dv/dt = (gl*(El-v) - g_na*(m*m*m)*h*(v-ENa) - g_kd*(n*n*n*n)*(v-EK) + I)/Cm : volt
dm/dt = 0.32*(mV**-1)*4*mV/exprel((13.*mV-v+VT)/(4*mV))/ms*(1-m)-0.28*(mV**-1)*5*mV/exprel((v-VT-40.*mV)/(5*mV))/ms*m : 1
dn/dt = 0.032*(mV**-1)*5*mV/exprel((15.*mV-v+VT)/(5*mV))/ms*(1.-n)-.5*exp((10.*mV-v+VT)/(40.*mV))/ms*n : 1
dh/dt = 0.128*exp((17.*mV-v+VT)/(18.*mV))/ms*(1.-h)-4./(1+exp((40.*mV-v+VT)/(5.*mV)))/ms*h : 1
I : amp
''')
group = NeuronGroup(num_neurons, eqs,
threshold='v > -40*mV',
refractory='v > -40*mV',
method='exponential_euler')
group.v = El
group.I = '0.7*nA * i / num_neurons'
# SpikeMonitor: records spike times and indices (dynamic arrays)
spike_mon = SpikeMonitor(group)
# StateMonitor: records v for a few neurons every timestep (2D dynamic array)
state_mon = StateMonitor(group, 'v', record=[0, 25, 50, 75, 99])
print(f"Running {num_neurons} HH neurons for {duration}...")
t_start = time.perf_counter()
run(duration)
t_elapsed = time.perf_counter() - t_start
print(f"Done in {t_elapsed:.2f}s")
print(f"\nTotal spikes: {spike_mon.num_spikes}")
print(f"StateMonitor recorded {state_mon.t.shape[0]} timesteps "
f"for {len(state_mon.record)} neurons")
# --- Now use the introspector ---
from brian2.codegen.runtime.cppyy_rt.introspector import get_introspector
intro = get_introspector()
# ---- 1. List all compiled code objects ----
print("=" * 60)
print("LIST OBJECTS")
print("=" * 60)
print(intro.list_objects())
# ---- 2. Inspect the state updater ----
print("\n" + "=" * 60)
print("INSPECT STATE UPDATER")
print("=" * 60)
# Using glob pattern — "stateupdater*" matches the full name
print(intro.inspect("*stateupdater*"))
# ---- 3. View just the params ----
print("\n" + "=" * 60)
print("PARAMS")
print("=" * 60)
print(intro.params("*stateupdater*"))
# ---- 4. View the namespace ----
print("\n" + "=" * 60)
print("NAMESPACE")
print("=" * 60)
print(intro.namespace("*stateupdater*"))
# ---- 5. View C++ globals ----
print("\n" + "=" * 60)
print("C++ GLOBALS")
print("=" * 60)
print(intro.cpp_globals())
# ---- 6. Evaluate a C++ expression ----
print("\n" + "=" * 60)
print("EVAL C++")
print("=" * 60)
print(f"M_PI = {intro.eval_cpp('M_PI')}")
print(f"sizeof(double) = {intro.eval_cpp('sizeof(double)', 'size_t')}")
print(f"_brian_mod(7, 3) = {intro.eval_cpp('_brian_mod(7, 3)', 'int32_t')}") |
- Rewrite ratemonitor.cpp to use capsule-based resize pattern (was using nonexistent .push_back() on DynamicArray) - Add _brian_cppyy_seed/_brian_cppyy_seed_random to support code and wire into RuntimeDevice.seed() for reproducible simulations - Add parameter count logging in run_block() for debugging - Add subgroup filtering to ratemonitor (matching Cython behavior)
Add CppyyDynamicArray1D/2D as drop-in replacements for Cython wrappers. dynamicarray.py now tries Cython first, falls back to cppyy if Cython extensions aren't compiled. Same API: .data, .resize(), .get_capsule(). PyCapsule names are identical so templates work with either backend.
- cppyy-backed SpikeQueue as drop-in Cython replacement - Synapse templates: synapses, push_spikes, create_array, create_generator - Capsule-based parameter passing for queue and dynamic arrays - Python-side synapse bookkeeping after cppyy code object runs - Fallback chain in spikequeue.py: Cython → cppyy
…xtraction, consolidated helpers - synapses_create_generator: 1024-element buffer for pre/post arrays (O(n/1024) resizes vs O(n)) - spikemonitor: extract capsules once before spike loop, cache data pointers - statemonitor: extract 2D capsules once before per-neuron loop - ratemonitor: use get_array_name() instead of hardcoded _dynamic_array_ prefix - synapses/synapses_push_spikes: move _extract_spike_queue to global support code in cppyy_rt.py - test-cppyy-audit.py: 16-test subprocess-isolated suite (all passing)
Rewrites docs/cppyy-backend.md with full architecture visualization: - End-to-end flow, three naming worlds, parameter sync invariant - Template architecture, zero-copy data bridge, synapse lifecycle - DynamicArray/SpikeQueue backends, monitor data flow - Guard code, global support code, compilation lifecycle - Updated limitations and next steps
…er protocol
- Port spikegenerator.cpp and spatialstateupdate.cpp from Cython templates
- Use bare N (not {{ N }}) for Constant variables in templates
- Fix cppyy int64_t buffer protocol on LP64 platforms: map int64_t→long
in _cppyy_c_data_type() since cppyy rejects int64_t* (long long*)
but accepts long* for numpy int64 arrays
- Add SpikeGeneratorGroup tests (basic + periodic) to test suite
- All 18 tests pass
- Remove cppyy_dynamicarray.py and cppyy_spikequeue.py: DynamicArray and SpikeQueue are compiled from Cython at install time, no runtime fallback needed. Revert dynamicarray.py and spikequeue.py to Cython-only with hard ImportError. - Fix 12 standalone test failures (NotImplementedError before run()): replaced self.variables["_source_offset"].get_value() with int(getattr(self.source, "start", 0)) in both _add_synapses_from_arrays and _add_synapses_generator. CPPStandaloneDevice rejects get_value() before run(); the offset values are Python-time constants. getattr(..., 0) also handles Synapses-as-source (no .start attribute). - Fix test_synapses_state_monitor (Python-side size desync): the new Cython synapse creation templates update C++ m_size directly but Python-side .size was only synced for cppyy code objects. Call _resize() unconditionally for all backends. Keep _update_synapse_numbers() cppyy-only — Cython templates already update N_outgoing/ N_incoming in C++; calling it again doubles the counts. - Fix SyntaxWarning in introspector.py: invalid escape sequence \d -> \\d in docstring.
_resize() and get_value() on _synaptic_pre cannot be called during connect() under CPPStandaloneDevice — the C++ code is only scheduled, not executed, so synapse counts are not yet known. Guard both blocks in _add_synapses_from_arrays and _add_synapses_generator with isinstance(get_device(), RuntimeDevice) so standalone tests pass while the Cython/cppyy runtime fixes from the previous commit are preserved.
…one failures len(self) calls get_value() which raises NotImplementedError on CPPStandaloneDevice before run(). Move old_num_synapses capture inside the RuntimeDevice guard so it is only evaluated on runtime (numpy/cython/cppyy) devices.
- Add group_get_indices.cpp template: loops over N neurons, evaluates the condition expression, and collects matching indices into a pre-allocated output buffer (_return_values_buf) with a count in _return_values_n. - CppyyCodeGenerator.determine_keywords(): detect group_get_indices by checking that both _cond and _indices are AuxiliaryVariables (unique to the IndexWrapper.__getitem__ path), then append the two output-buffer params to function_params so the C++ signature includes them. - CppyyCodeObject.variables_to_namespace(): inject _return_values_buf and _return_values_n numpy arrays when template_name == 'group_get_indices'. - CppyyCodeObject._build_param_mapping(): mirror the two extra entries so the Python call-site args match the C++ signature. - CppyyCodeObject.run_block(): after compiled_func(*args), if this is a group_get_indices codeobj return the sliced result array. - conftest.py: add cppyy implementation of fake_randn so tests using the fake_randn_randn_fixture work under the cppyy target. - tests/__init__.py: auto-detect cppyy alongside numpy/cython so calling brian2.test() without explicit targets also runs the cppyy suite. - run_test_suite.py: detect cppyy availability and add it to in_parallel so CI standalone:false jobs also exercise the cppyy target.
initialise_queue() calls get_value() on eventspace, _delays and synapse_sources, which raises NotImplementedError under CPPStandaloneDevice before run(). The before_run() override that calls it was added for cppyy (C++ before_code blocks can't invoke Python), but the guard was missing. Under standalone mode the queue is set up in the generated C++ code, so Python must not try to initialise it during before_run().
- Add cppyy>=3.1 as optional dependency (pip install .[cppyy]) - Install cppyy on all non-standalone runners (Linux, macOS, Windows) - Add ilammy/msvc-dev-cmd step on Windows so Cling can find cl.exe at JIT time - Add DYLD_LIBRARY_PATH for macOS runners to resolve cppyy's hardcoded MacPorts zstd path against Homebrew locations (arm64 + Intel) - Soft-fail the install step so CI is not broken if cppyy is unavailable
CPyCppyy has no pre-built wheel for Python 3.14+ on Windows. Building from source fails: the pre-built cppyy_backend-1.15.3 .lib is missing Cppyy::GetNumBasesLongestBranch which CPyCppyy 1.13.0 requires at link time. Re-enable once cppyy publishes compatible Windows wheels.
…names When Brian2 GC's a TimedArray (e.g. at test teardown), its Python name becomes available for reuse. A subsequent test can create a new TimedArray with the same name but different K/N parameters, generating a different C++ function body under the same symbol (e.g. `_timedarray`). The previous #ifndef guard was keyed on the body content-hash, so two bodies with the same symbol but different hashes would both try to define the same C++ symbol in Cling — causing a "redefinition" error. Fix strategy: - cppyy_generator: wrap each user-function support code piece in a guard keyed by the C++ *symbol name* (not body hash) so Cling only compiles the first occurrence of any given name. Fix _extract_primary_cpp_symbol to only inspect the first declaration line (not function body lines). - cppyy_rt: add _rename_conflicting_user_functions() that detects when a function name is reused with a different body (different content hash) and renames both the function and its _namespace_*_values global in the code string. This prevents both the Cling redefinition error and the cppyy "buffer too large for value" error from reassigning a double* global to an array of a different size.
…apses_create_generator
When result_index_condition=True and if_expression is set (e.g. S.connect("i==j")),
both create_cond and update sections independently declare `const int32_t _post_idx =
_raw_post_idx;` in the same C++ scope. Cling rejects the second declaration as a
redefinition.
Fix: wrap the create_cond code section in a braced scope `{}` with the condition
result captured to `_create_cond_result`. The update section then declares _post_idx
first in the outer scope, which is also available for the buffer-filling loop.
This fixes ~14 test_subgroup.py and test_synapses.py failures (test_synaptic_propagation,
test_synapse_creation_generator_*, test_spike_monitor, test_no_reference_*, etc.).
The cppyy group_variable_set.cpp and group_variable_set_conditional.cpp
templates were missing the {# ALLOWS_SCALAR_WRITE #} directive that Cython
equivalents have. Without it, the code generator raises "Writing to scalar
variable X not allowed in this context" when setting shared variables like
G.E_L = "expression", S.delay = 1*ms, etc.
Fixes test_scalar_variable, test_delay_specification, test_delays_pathways,
test_scalar_parameter_access, and related tests.
…ator to support Synapses-as-target
…ator; use mutable _uiter_size for fixed-size sample
… timedarray/binomial, fix introspector SyntaxWarning
… GSL skipping
Three bugs caused CI failures for the cppyy runtime target:
1. `static std::mt19937 _brian_cppyy_rng` had internal linkage, so each
new Cling translation unit (compiled per network.run() call) got a fresh
default-seeded copy — all runs produced identical random values.
Fix: remove `static` to give external linkage; one shared instance across
all TUs. Also move `_dist_rand` to file scope (no static).
2. `seed()` checked `hasattr(cppyy.gbl, "_brian_cppyy_seed")` before the
support code was compiled, so pre-run seed() calls were silent no-ops.
Fix: call `_ensure_support_code()` eagerly inside `seed()`.
3. `get/set_random_state()` ignored C++ RNG state entirely, so
`restore(restore_random_state=True)` could not reproduce identical runs.
Fix: expose `_brian_cppyy_get/set_rng_state()` C++ functions (using
std::ostringstream/istringstream) and integrate into get/set_random_state().
Additionally, `std::normal_distribution` has an internal cache that cannot
be serialized. Replace with a custom Marsaglia polar method using explicit
`_brian_randn_has_spare` / `_brian_randn_spare` file-scope variables that
round-trip cleanly through the state string.
GSL tests were also failing because `skip_if_not_implemented` only skipped
for the numpy target, not cppyy. Fix: check `effective in ("numpy", "cppyy")`.


Problem
The current Brian2 code generation pipeline suffers from a fundamental performance bottleneck. The issue is not tied to the specific tools we use, but rather to the Ahead-of-Time (AOT) compilation paradigm itself.
Regardless of whether we use Cython (our current approach) or manual C-extensions, the workflow remains slow and cumbersome:
In other words, the bottleneck lies in the file-based, external-compiler, AOT workflow.
Proposed Solution: JIT Compilation with
cppyyThis PR introduces
cppyyas a new runtime code generation target, shifting from AOT to Just-in-Time (JIT) compilation.With
cppyy, C++ code is compiled in-memory using the Cling C++ interpreter, which eliminates:Current Status
Next Steps
Fix for dynamic arrays and spikequeue and synapses