Skip to content

Fix #444: Add pre-allocation option for MCMC strategy to prevent OOM#891

Open
eladerez wants to merge 2 commits intonerfstudio-project:mainfrom
eladerez:fix/mcmc-memory-preallocation
Open

Fix #444: Add pre-allocation option for MCMC strategy to prevent OOM#891
eladerez wants to merge 2 commits intonerfstudio-project:mainfrom
eladerez:fix/mcmc-memory-preallocation

Conversation

@eladerez
Copy link
Copy Markdown

@eladerez eladerez commented Mar 14, 2026

The problem
MCMC training crashes with OOM errors at ~18M Gaussians due to memory fragmentation from repeated torch.cat operations during the densification stage.

Solution:
Added an optional "preallocate" flag to "MCMCStrategy" that:
Pre-allocates buffers to "cap_max" size at initialization
Writing new Gaussians into pre-allocated slots instead of concatenating

Changes:
gsplat/strategy/mcmc.py: Added "preallocate" field and "n_active" state tracking
gsplat/strategy/ops.py: Modified sample_add() for support to buffer writes
examples/simple_trainer.py: Managing buffer and checkpoint

Performance:
Memory: 6.4% reduction at 1M Gaussians (1.216GB vs 1.299GB baseline)
Trade-off: ~24% slower rendering (opt-in feature, disabled by default)
Benefit: Prevents OOM at 18M+ Gaussians scale

Testing:
All strategy tests passing (2/2)

How to run:
python examples/simple_trainer.py mcmc --strategy.preallocate

eladerez and others added 2 commits March 14, 2026 22:26
…gy to prevent OOM

- Add optional  flag to MCMCStrategy (default: False)
- Pre-allocate buffers to cap_max size to avoid torch.cat memory spikes
- Track active Gaussians separately from buffer size with n_active
- Modify sample_add() to write into pre-allocated buffer instead of concat
- Add checkpoint support for n_active state
- Memory reduction: 6.4% at 1M Gaussians (1.216GB vs 1.299GB baseline)
- Prevents OOM crashes at 18M+ Gaussians while maintaining compatibility

This is an opt-in feature that eliminates memory fragmentation from
repeated torch.cat operations during MCMC densification. Users can enable
it with --strategy.preallocate flag when using MCMC strategy.
…lice via param_groups

The original preallocate fix allocated cap_max-sized parameter tensors
upfront but let each optimizer track the full buffer. This caused
optimizer.step() to process all cap_max elements every iteration — a
20x+ overhead when cap_max=1_000_000 and n_initial≈50k.

Fix:
- Add grow_active_params() in ops.py: creates a narrow Parameter view
  into splats[:n_active] and migrates Adam momentum tensors (zero-pad
  new rows) when the active count grows.
- Wire each optimizer to track the active-slice Parameter at init time
  (simple_trainer.py), so optimizer.step() only touches n_active rows.
- Grow the active slice (and optimizer state) inside _add_new_gs after
  sample_add writes new Gaussians into the pre-allocated buffer.
- Use active_params directly in _relocate_gs and noise injection,
  removing the per-step temporary Parameter allocation.

Timing: cap_max=100_000 vs cap_max=500 with identical n_active=200
shows a 1.01x ratio (vs ~200x before), confirmed by the new
test_mcmc_preallocate_time_independent_of_cap_max test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eladerez
Copy link
Copy Markdown
Author

I've submitted PR #891 that addresses this with pre-allocated buffers. Would appreciate feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant