Runtime Refinement for Neural Networks

This repository provides reference implementations for the scheduled runtime refinement regime described in:

R. de Beer (2026). Scheduled Runtime Refinement for Neural Networks: Periodic Pruning, Replay, and Distillation as an Operating Regime. Zenodo. DOI: 10.5281/zenodo.18363662

🔎 Overview

Runtime refinement treats neural network compression and consolidation as a recurring operational process rather than a one-off optimization. The approach integrates:

Importance Estimation - Identify high-value vs. low-contribution parameters
Downscaling & Pruning - Remove or reduce low-utility weights
Targeted Replay - Reinforce critical pathways to prevent forgetting
Distillation - Consolidate representations (optional)
Caching - Offload rarely-used knowledge (optional)

Key Results: Progressive refinement achieves 38.5% sparsity (1.63× compression) while maintaining 70.5% test accuracy on CIFAR-10, demonstrating that periodic consolidation can reduce model size without catastrophic performance loss.

🚀 Quick Start with Google Colab (Recommended)

The easiest way to try runtime refinement is through our interactive Colab notebook. No installation required—runs entirely in your browser with free GPU access!

What the Colab Notebook Does

The notebook demonstrates the complete runtime refinement cycle on a SimpleCNN model (582K parameters) trained on CIFAR-10:

Trains a convolutional neural network for 20 epochs
Applies refinement cycles every 5 epochs (3 cycles total)
Progressively prunes 15% of weights per cycle (15% → 28% → 39% final sparsity)
Maintains accuracy around 70% despite compression
Generates visualizations showing accuracy, sparsity, and compression over time
Takes ~10 minutes to complete on a free T4 GPU

📥 How to Run the Colab Notebook

Click the badge:
Enable GPU:
- Click Runtime → Change runtime type
- Select T4 GPU from the Hardware accelerator dropdown
- Click Save
Run the notebook:
- Click Runtime → Run all (or press Ctrl+F9)
- Wait ~10 minutes for completion
Download results:
- Click the folder icon (📁) in the left sidebar
- Find runtime_refinement_results.png
- Right-click → Download

📒 Customizing the Notebook

Edit the configuration section at the top of the notebook to experiment with different settings:

class Config:
    epochs = 20                # Try 50 for longer training
    refinement_interval = 5    # Try 10 for less frequent refinement
    prune_ratio = 0.15         # Try 0.2 for more aggressive pruning
    subset_size = 10000        # Set to 0 to use full dataset (50K images)

🗳️ What You'll See

The notebook produces a 4-panel visualization:

Top-left: Training and test accuracy over epochs (should stay ~70%)
Top-right: Sparsity progression (0% → 15% → 28% → 39%)
Bottom-left: Training loss (should decrease steadily)
Bottom-right: Accuracy vs. sparsity trade-off curve

Expected Output:

Final Performance:
  Test Accuracy: 70.5%
  Sparsity: 38.5%

Model Compression:
  Original parameters: 582,346
  Active parameters: 357,868
  Compression ratio: 1.63×

Troubleshooting Colab

Problem: "No GPU detected"

Solution: Runtime → Change runtime type → Select GPU → Save

Problem: "Session crashed" or out of memory

Solution: Reduce batch_size from 128 to 64 in the configuration

Problem: Training is very slow

Solution: Make sure GPU is enabled (should see "Tesla T4" in output)

Problem: Can't find the results plot

Solution: Look for runtime_refinement_results.png in the Files panel (📁 icon)

📐 Repository Structure

runtime_refinement/
├── src/                    # C++ proof-of-concept (logistic regression)
│   ├── refinement.cpp
│   └── ...
├── python/                 # Python/PyTorch implementations
│   ├── cifar10_demo.py    # SimpleCNN on CIFAR-10 (standalone script)
│   └── requirements.txt
├── notebooks/             # Jupyter/Colab notebooks
│   └── runtime_refinement_demo.ipynb  # Interactive Colab demo
├── results/               # Example outputs
│   └── cifar10_results.png
├── CMakeLists.txt         # C++ build configuration
├── LICENSE                # MIT License
└── README.md

📌 Local Installation (Python/PyTorch)

If you prefer to run locally instead of using Colab:

Prerequisites

Python 3.7+
PyTorch 1.12+
CUDA-capable GPU (optional but recommended)

Installation

# Clone the repository
git clone https://github.com/infinityabundance/runtime_refinement.git
cd runtime_refinement

# Install dependencies
cd python
pip install -r requirements.txt

# Or install manually
pip install torch torchvision matplotlib numpy

Run the Demo

cd python
python cifar10_demo.py --epochs 20 --refinement-interval 5

Command-Line Options

python cifar10_demo.py --help

Key options:

--epochs: Total training epochs (default: 20)
--refinement-interval: Epochs between refinement cycles (default: 5)
--prune-ratio: Fraction of weights to prune per cycle (default: 0.15)
--subset-size: Training samples to use (default: 10000, set to 0 for full dataset)
--save-model: Save final model checkpoint
--output-dir: Directory for results (default: results/)

Example Commands

# Quick test (default settings)
python cifar10_demo.py

# Longer training with full dataset
python cifar10_demo.py --epochs 50 --subset-size 0

# More aggressive pruning
python cifar10_demo.py --prune-ratio 0.2 --refinement-interval 10

# Save the trained model
python cifar10_demo.py --save-model --output-dir my_results/

👨‍💻 Usage Example

from runtime_refinement import RuntimeRefinement

# Initialize refinement system
model = SimpleCNN()
refiner = RuntimeRefinement(model, device='cuda')

# Training loop with periodic refinement
for epoch in range(num_epochs):
    train_one_epoch(model, train_loader)
    
    # Execute refinement cycle every N epochs
    if epoch % refinement_interval == 0:
        metrics = refiner.refinement_cycle(
            replay_loader,
            prune_ratio=0.15
        )
        print(f"Sparsity: {metrics['sparsity']:.1%}")

📟 C++ Implementation

A minimal C++ reference implementation is also provided for the core concepts:

mkdir build && cd build
cmake ..
make
./runtime_refinement_demo

This demonstrates the refinement cycle on a toy logistic regression model.

📊 Extending the Code

The implementation is designed to be modular. You can:

Try Different Architectures

Replace SimpleCNN with ResNet, Transformers, or custom models:

from torchvision.models import resnet18
model = resnet18(num_classes=10)
refiner = RuntimeRefinement(model, device='cuda')

Custom Importance Metrics

Extend estimate_importance() with gradient-based or activation-based signals:

def estimate_importance(self):
    importance = {}
    for name, param in self.model.named_parameters():
        # Custom importance metric
        importance[name] = param.grad.abs().mean() if param.grad is not None else 0
    return importance

Different Pruning Strategies

Implement structured pruning, layer-wise pruning, or custom criteria:

# Structured pruning example
def prune_structured(self, prune_ratio=0.2):
    # Prune entire filters instead of individual weights
    # Implementation here...

Advanced Replay

Add generative replay, prioritized sampling, or uncertainty-based selection:

# Priority-based replay buffer
replay_buffer = PrioritizedReplayBuffer(capacity=1000)
replay_buffer.add(examples, priorities=uncertainty_scores)

🔧 Configuration

Key hyperparameters and their typical ranges:

Parameter	Default	Range	Description
`--epochs`	20	10-100	Total training epochs
`--refinement-interval`	5	3-10	Epochs between refinement cycles
`--prune-ratio`	0.15	0.10-0.30	Fraction of weights to prune per cycle
`--replay-buffer-size`	1000	500-5000	Number of examples in replay buffer
`--replay-steps`	30	10-100	Optimization steps during replay
`--downscale-factor`	0.98	0.95-0.99	Global downscaling multiplier

📇 Citation

If you use this code in your research, please cite:

@misc{debeer2026runtime,
  title={Scheduled Runtime Refinement for Neural Networks: 
         Periodic Pruning, Replay, and Distillation as an Operating Regime},
  author={de Beer, Riaan},
  year={2026},
  publisher={Zenodo},
  doi={10.5281/zenodo.18363662},
  url={https://doi.org/10.5281/zenodo.18363662}
}

📜 Requirements

Python

Python 3.7+
PyTorch 1.12+
torchvision 0.13+
matplotlib 3.5+
numpy 1.21+

C++

C++17 compiler
CMake 3.10+

🪪 License

MIT License - see LICENSE for details.

🌱 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Areas for Contribution

Additional pruning strategies (structured, layer-wise, etc.)
Different importance estimation methods
Integration with other frameworks (TensorFlow, JAX)
Benchmarks on other datasets/architectures
Deployment optimizations

⚠️ Troubleshooting

Common Issues

ImportError: No module named 'torch'

pip install torch torchvision

CUDA out of memory

# Reduce batch size
python cifar10_demo.py --batch-size 64

Training is slow

# Make sure GPU is being used
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Results look different from paper

Make sure you're using the same random seed (--seed 42)
Check that GPU is enabled
Verify configuration matches paper settings

🎬 Empirical Demonstration (CIFAR-10)

The figure below shows a small-scale empirical demonstration of the scheduled runtime refinement regime described in the accompanying paper. The experiment is designed as a proof of concept, not as a benchmark.

What this experiment demonstrates

A simple convolutional network (≈582k parameters) is trained on a 10k/10k CIFAR-10 split for 20 epochs.
A refinement cycle is executed every 5 epochs, consisting of:
- magnitude-based pruning, and
- lightweight replay from recent high-loss examples.
Refinement is performed during training to simulate scheduled runtime consolidation.

Interpretation of the figure

Training and test accuracy (top-left):
Test accuracy may dip briefly immediately after pruning events, but consistently recovers and improves following replay, indicating that scheduled pruning does not permanently degrade performance.
Sparsity evolution (top-right):
Global sparsity increases in a stepwise manner at refinement points (0% → 15.0% → 27.7% → 38.5%), illustrating controlled, scheduled compression rather than one-shot pruning.
Training loss (bottom-left):
Loss decreases smoothly throughout training, indicating stable optimization despite repeated pruning and consolidation.
Accuracy vs. sparsity (bottom-right):
Later, more sparse model states achieve comparable or higher test accuracy than earlier dense states, illustrating increased performance density (effective accuracy per active parameter).

Refinement Progression

Refinement Cycle	Epoch	Sparsity (%)	Test Accuracy (%)
Dense baseline	5	0.0	59.24
Cycle 1	10	15.0	67.29
Cycle 2	15	27.7	70.35
Cycle 3	20	38.5	70.91

Final outcome

Final test accuracy: 70.9%
Final sparsity: 38.5%
Active parameters: 357,868 / 582,346
Compression ratio: 1.63×
Hardware: NVIDIA Tesla T4
Total runtime: ~1.8 minutes

This experiment demonstrates that scheduled pruning combined with targeted replay can remove a substantial fraction of parameters while maintaining accuracy under a bounded refinement overhead.

Experimental context:

Dataset: CIFAR-10 (10k training / 10k test samples)
Model: SimpleCNN (~582k parameters)
Refinement schedule: every 5 epochs
Final result: 70.9% test accuracy at 38.5% sparsity (1.63× compression)
Hardware: NVIDIA Tesla T4 GPU
Total runtime: ~1.8 minutes

This experiment is intended as a proof of concept, not a benchmark. It demonstrates that scheduled pruning combined with lightweight replay can reduce active parameter mass while maintaining accuracy under bounded overhead.

❓ FAQ

Q: Can this work with pre-trained models?
A: Yes! Load your pre-trained model and apply refinement during fine-tuning or deployment.

Q: Does this work with transformers/LLMs?
A: The concept applies, but you may need to adjust pruning ratios and replay strategies for very large models.

Q: How much speedup can I expect?
A: Speedup depends on hardware support for sparse operations. On GPUs with sparse acceleration, expect 1.3-2× speedup at 40% sparsity.

Q: Will this hurt accuracy?
A: Small accuracy drops (1-3%) are typical. The key is finding the right prune_ratio for your use case.

Q: Can I use this during training or only after?
A: Both! You can integrate refinement into training (as shown) or apply it to already-trained models.

✉️ Contact

Author: Riaan de Beer
Paper: Zenodo
Issues: GitHub Issues
Orcid: [https://orcid.org/0009-0006-1155-027X]

📝 Acknowledgments

This work builds on research in neural network pruning, continual learning, and knowledge distillation. See the paper for detailed references.

💼 Related Work and Background

This repository accompanies the paper Scheduled Runtime Refinement for Neural Networks and provides a concrete, executable demonstration of the proposed ideas.

Some of the broader conceptual motivation for runtime consolidation, replay, and compression is explored at book length in:

Riaan de Beer,
Sleep for AI: Compression, Consolidation, and the Relentless Acceleration of Intelligence
ISBN: 979-8244482706
Amazon: https://www.amazon.com/dp/B0GHSTTTDR

The book provides a high-level, systems-oriented perspective, while this repository focuses on a minimal, reproducible implementation.

🗂️ Changelog

Version 1.2 (January 2026)

Added empirical validation (Section 6.8)
Added Google Colab notebook with extensive documentation
Added standalone Python implementation with CLI
Improved documentation and examples

Version 1.1 (January 2026)

Initial public release
C++ proof-of-concept implementation
Paper published on Zenodo

⭐ If you find this useful, please star the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
notebooks		notebooks
python		python
results		results
src		src
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

License

infinityabundance/runtime_refinement

Folders and files

Latest commit

History

Repository files navigation