This repository provides reference implementations for the scheduled runtime refinement regime described in:
R. de Beer (2026). Scheduled Runtime Refinement for Neural Networks: Periodic Pruning, Replay, and Distillation as an Operating Regime. Zenodo. DOI: 10.5281/zenodo.18363662
Runtime refinement treats neural network compression and consolidation as a recurring operational process rather than a one-off optimization. The approach integrates:
- Importance Estimation - Identify high-value vs. low-contribution parameters
- Downscaling & Pruning - Remove or reduce low-utility weights
- Targeted Replay - Reinforce critical pathways to prevent forgetting
- Distillation - Consolidate representations (optional)
- Caching - Offload rarely-used knowledge (optional)
Key Results: Progressive refinement achieves 38.5% sparsity (1.63Γ compression) while maintaining 70.5% test accuracy on CIFAR-10, demonstrating that periodic consolidation can reduce model size without catastrophic performance loss.
The easiest way to try runtime refinement is through our interactive Colab notebook. No installation requiredβruns entirely in your browser with free GPU access!
The notebook demonstrates the complete runtime refinement cycle on a SimpleCNN model (582K parameters) trained on CIFAR-10:
- Trains a convolutional neural network for 20 epochs
- Applies refinement cycles every 5 epochs (3 cycles total)
- Progressively prunes 15% of weights per cycle (15% β 28% β 39% final sparsity)
- Maintains accuracy around 70% despite compression
- Generates visualizations showing accuracy, sparsity, and compression over time
- Takes ~10 minutes to complete on a free T4 GPU
-
Enable GPU:
- Click
RuntimeβChange runtime type - Select
T4 GPUfrom the Hardware accelerator dropdown - Click
Save
- Click
-
Run the notebook:
- Click
RuntimeβRun all(or pressCtrl+F9) - Wait ~10 minutes for completion
- Click
-
Download results:
- Click the folder icon (π) in the left sidebar
- Find
runtime_refinement_results.png - Right-click β Download
Edit the configuration section at the top of the notebook to experiment with different settings:
class Config:
epochs = 20 # Try 50 for longer training
refinement_interval = 5 # Try 10 for less frequent refinement
prune_ratio = 0.15 # Try 0.2 for more aggressive pruning
subset_size = 10000 # Set to 0 to use full dataset (50K images)The notebook produces a 4-panel visualization:
- Top-left: Training and test accuracy over epochs (should stay ~70%)
- Top-right: Sparsity progression (0% β 15% β 28% β 39%)
- Bottom-left: Training loss (should decrease steadily)
- Bottom-right: Accuracy vs. sparsity trade-off curve
Expected Output:
Final Performance:
Test Accuracy: 70.5%
Sparsity: 38.5%
Model Compression:
Original parameters: 582,346
Active parameters: 357,868
Compression ratio: 1.63Γ
Problem: "No GPU detected"
- Solution: Runtime β Change runtime type β Select GPU β Save
Problem: "Session crashed" or out of memory
- Solution: Reduce
batch_sizefrom 128 to 64 in the configuration
Problem: Training is very slow
- Solution: Make sure GPU is enabled (should see "Tesla T4" in output)
Problem: Can't find the results plot
- Solution: Look for
runtime_refinement_results.pngin the Files panel (π icon)
runtime_refinement/
βββ src/ # C++ proof-of-concept (logistic regression)
β βββ refinement.cpp
β βββ ...
βββ python/ # Python/PyTorch implementations
β βββ cifar10_demo.py # SimpleCNN on CIFAR-10 (standalone script)
β βββ requirements.txt
βββ notebooks/ # Jupyter/Colab notebooks
β βββ runtime_refinement_demo.ipynb # Interactive Colab demo
βββ results/ # Example outputs
β βββ cifar10_results.png
βββ CMakeLists.txt # C++ build configuration
βββ LICENSE # MIT License
βββ README.md
If you prefer to run locally instead of using Colab:
- Python 3.7+
- PyTorch 1.12+
- CUDA-capable GPU (optional but recommended)
# Clone the repository
git clone https://github.com/infinityabundance/runtime_refinement.git
cd runtime_refinement
# Install dependencies
cd python
pip install -r requirements.txt
# Or install manually
pip install torch torchvision matplotlib numpycd python
python cifar10_demo.py --epochs 20 --refinement-interval 5python cifar10_demo.py --helpKey options:
--epochs: Total training epochs (default: 20)--refinement-interval: Epochs between refinement cycles (default: 5)--prune-ratio: Fraction of weights to prune per cycle (default: 0.15)--subset-size: Training samples to use (default: 10000, set to 0 for full dataset)--save-model: Save final model checkpoint--output-dir: Directory for results (default: results/)
# Quick test (default settings)
python cifar10_demo.py
# Longer training with full dataset
python cifar10_demo.py --epochs 50 --subset-size 0
# More aggressive pruning
python cifar10_demo.py --prune-ratio 0.2 --refinement-interval 10
# Save the trained model
python cifar10_demo.py --save-model --output-dir my_results/from runtime_refinement import RuntimeRefinement
# Initialize refinement system
model = SimpleCNN()
refiner = RuntimeRefinement(model, device='cuda')
# Training loop with periodic refinement
for epoch in range(num_epochs):
train_one_epoch(model, train_loader)
# Execute refinement cycle every N epochs
if epoch % refinement_interval == 0:
metrics = refiner.refinement_cycle(
replay_loader,
prune_ratio=0.15
)
print(f"Sparsity: {metrics['sparsity']:.1%}")A minimal C++ reference implementation is also provided for the core concepts:
mkdir build && cd build
cmake ..
make
./runtime_refinement_demoThis demonstrates the refinement cycle on a toy logistic regression model.
The implementation is designed to be modular. You can:
Replace SimpleCNN with ResNet, Transformers, or custom models:
from torchvision.models import resnet18
model = resnet18(num_classes=10)
refiner = RuntimeRefinement(model, device='cuda')Extend estimate_importance() with gradient-based or activation-based signals:
def estimate_importance(self):
importance = {}
for name, param in self.model.named_parameters():
# Custom importance metric
importance[name] = param.grad.abs().mean() if param.grad is not None else 0
return importanceImplement structured pruning, layer-wise pruning, or custom criteria:
# Structured pruning example
def prune_structured(self, prune_ratio=0.2):
# Prune entire filters instead of individual weights
# Implementation here...Add generative replay, prioritized sampling, or uncertainty-based selection:
# Priority-based replay buffer
replay_buffer = PrioritizedReplayBuffer(capacity=1000)
replay_buffer.add(examples, priorities=uncertainty_scores)Key hyperparameters and their typical ranges:
| Parameter | Default | Range | Description |
|---|---|---|---|
--epochs |
20 | 10-100 | Total training epochs |
--refinement-interval |
5 | 3-10 | Epochs between refinement cycles |
--prune-ratio |
0.15 | 0.10-0.30 | Fraction of weights to prune per cycle |
--replay-buffer-size |
1000 | 500-5000 | Number of examples in replay buffer |
--replay-steps |
30 | 10-100 | Optimization steps during replay |
--downscale-factor |
0.98 | 0.95-0.99 | Global downscaling multiplier |
If you use this code in your research, please cite:
@misc{debeer2026runtime,
title={Scheduled Runtime Refinement for Neural Networks:
Periodic Pruning, Replay, and Distillation as an Operating Regime},
author={de Beer, Riaan},
year={2026},
publisher={Zenodo},
doi={10.5281/zenodo.18363662},
url={https://doi.org/10.5281/zenodo.18363662}
}- Python 3.7+
- PyTorch 1.12+
- torchvision 0.13+
- matplotlib 3.5+
- numpy 1.21+
- C++17 compiler
- CMake 3.10+
MIT License - see LICENSE for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Additional pruning strategies (structured, layer-wise, etc.)
- Different importance estimation methods
- Integration with other frameworks (TensorFlow, JAX)
- Benchmarks on other datasets/architectures
- Deployment optimizations
ImportError: No module named 'torch'
pip install torch torchvisionCUDA out of memory
# Reduce batch size
python cifar10_demo.py --batch-size 64Training is slow
# Make sure GPU is being used
python -c "import torch; print(torch.cuda.is_available())"
# Should print: TrueResults look different from paper
- Make sure you're using the same random seed (
--seed 42) - Check that GPU is enabled
- Verify configuration matches paper settings
The figure below shows a small-scale empirical demonstration of the scheduled runtime refinement regime described in the accompanying paper. The experiment is designed as a proof of concept, not as a benchmark.
- A simple convolutional network (β582k parameters) is trained on a 10k/10k CIFAR-10 split for 20 epochs.
- A refinement cycle is executed every 5 epochs, consisting of:
- magnitude-based pruning, and
- lightweight replay from recent high-loss examples.
- Refinement is performed during training to simulate scheduled runtime consolidation.
-
Training and test accuracy (top-left):
Test accuracy may dip briefly immediately after pruning events, but consistently recovers and improves following replay, indicating that scheduled pruning does not permanently degrade performance. -
Sparsity evolution (top-right):
Global sparsity increases in a stepwise manner at refinement points (0% β 15.0% β 27.7% β 38.5%), illustrating controlled, scheduled compression rather than one-shot pruning. -
Training loss (bottom-left):
Loss decreases smoothly throughout training, indicating stable optimization despite repeated pruning and consolidation. -
Accuracy vs. sparsity (bottom-right):
Later, more sparse model states achieve comparable or higher test accuracy than earlier dense states, illustrating increased performance density (effective accuracy per active parameter).
| Refinement Cycle | Epoch | Sparsity (%) | Test Accuracy (%) |
|---|---|---|---|
| Dense baseline | 5 | 0.0 | 59.24 |
| Cycle 1 | 10 | 15.0 | 67.29 |
| Cycle 2 | 15 | 27.7 | 70.35 |
| Cycle 3 | 20 | 38.5 | 70.91 |
- Final test accuracy: 70.9%
- Final sparsity: 38.5%
- Active parameters: 357,868 / 582,346
- Compression ratio: 1.63Γ
- Hardware: NVIDIA Tesla T4
- Total runtime: ~1.8 minutes
This experiment demonstrates that scheduled pruning combined with targeted replay can remove a substantial fraction of parameters while maintaining accuracy under a bounded refinement overhead.
Experimental context:
- Dataset: CIFAR-10 (10k training / 10k test samples)
- Model: SimpleCNN (~582k parameters)
- Refinement schedule: every 5 epochs
- Final result: 70.9% test accuracy at 38.5% sparsity (1.63Γ compression)
- Hardware: NVIDIA Tesla T4 GPU
- Total runtime: ~1.8 minutes
This experiment is intended as a proof of concept, not a benchmark. It demonstrates that scheduled pruning combined with lightweight replay can reduce active parameter mass while maintaining accuracy under bounded overhead.
Q: Can this work with pre-trained models?
A: Yes! Load your pre-trained model and apply refinement during fine-tuning or deployment.
Q: Does this work with transformers/LLMs?
A: The concept applies, but you may need to adjust pruning ratios and replay strategies for very large models.
Q: How much speedup can I expect?
A: Speedup depends on hardware support for sparse operations. On GPUs with sparse acceleration, expect 1.3-2Γ speedup at 40% sparsity.
Q: Will this hurt accuracy?
A: Small accuracy drops (1-3%) are typical. The key is finding the right prune_ratio for your use case.
Q: Can I use this during training or only after?
A: Both! You can integrate refinement into training (as shown) or apply it to already-trained models.
- Author: Riaan de Beer
- Paper: Zenodo
- Issues: GitHub Issues
- Orcid: [https://orcid.org/0009-0006-1155-027X]
This work builds on research in neural network pruning, continual learning, and knowledge distillation. See the paper for detailed references.
This repository accompanies the paper Scheduled Runtime Refinement for Neural Networks and provides a concrete, executable demonstration of the proposed ideas.
Some of the broader conceptual motivation for runtime consolidation, replay, and compression is explored at book length in:
Riaan de Beer,
Sleep for AI: Compression, Consolidation, and the Relentless Acceleration of Intelligence
ISBN: 979-8244482706
Amazon: https://www.amazon.com/dp/B0GHSTTTDR
The book provides a high-level, systems-oriented perspective, while this repository focuses on a minimal, reproducible implementation.
- Added empirical validation (Section 6.8)
- Added Google Colab notebook with extensive documentation
- Added standalone Python implementation with CLI
- Improved documentation and examples
- Initial public release
- C++ proof-of-concept implementation
- Paper published on Zenodo
β If you find this useful, please star the repository!
