Skip to content

infinityabundance/runtime_refinement

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Runtime Refinement logo

Runtime Refinement for Neural Networks

DOI License: MIT Open In Colab

This repository provides reference implementations for the scheduled runtime refinement regime described in:

R. de Beer (2026). Scheduled Runtime Refinement for Neural Networks: Periodic Pruning, Replay, and Distillation as an Operating Regime. Zenodo. DOI: 10.5281/zenodo.18363662

πŸ”Ž Overview

Runtime refinement treats neural network compression and consolidation as a recurring operational process rather than a one-off optimization. The approach integrates:

  1. Importance Estimation - Identify high-value vs. low-contribution parameters
  2. Downscaling & Pruning - Remove or reduce low-utility weights
  3. Targeted Replay - Reinforce critical pathways to prevent forgetting
  4. Distillation - Consolidate representations (optional)
  5. Caching - Offload rarely-used knowledge (optional)

Key Results: Progressive refinement achieves 38.5% sparsity (1.63Γ— compression) while maintaining 70.5% test accuracy on CIFAR-10, demonstrating that periodic consolidation can reduce model size without catastrophic performance loss.


πŸš€ Quick Start with Google Colab (Recommended)

The easiest way to try runtime refinement is through our interactive Colab notebook. No installation requiredβ€”runs entirely in your browser with free GPU access!

What the Colab Notebook Does

The notebook demonstrates the complete runtime refinement cycle on a SimpleCNN model (582K parameters) trained on CIFAR-10:

  • Trains a convolutional neural network for 20 epochs
  • Applies refinement cycles every 5 epochs (3 cycles total)
  • Progressively prunes 15% of weights per cycle (15% β†’ 28% β†’ 39% final sparsity)
  • Maintains accuracy around 70% despite compression
  • Generates visualizations showing accuracy, sparsity, and compression over time
  • Takes ~10 minutes to complete on a free T4 GPU

πŸ“₯ How to Run the Colab Notebook

  1. Click the badge: Open In Colab

  2. Enable GPU:

    • Click Runtime β†’ Change runtime type
    • Select T4 GPU from the Hardware accelerator dropdown
    • Click Save
  3. Run the notebook:

    • Click Runtime β†’ Run all (or press Ctrl+F9)
    • Wait ~10 minutes for completion
  4. Download results:

    • Click the folder icon (πŸ“) in the left sidebar
    • Find runtime_refinement_results.png
    • Right-click β†’ Download

πŸ“’ Customizing the Notebook

Edit the configuration section at the top of the notebook to experiment with different settings:

class Config:
    epochs = 20                # Try 50 for longer training
    refinement_interval = 5    # Try 10 for less frequent refinement
    prune_ratio = 0.15         # Try 0.2 for more aggressive pruning
    subset_size = 10000        # Set to 0 to use full dataset (50K images)

πŸ—³οΈ What You'll See

The notebook produces a 4-panel visualization:

  • Top-left: Training and test accuracy over epochs (should stay ~70%)
  • Top-right: Sparsity progression (0% β†’ 15% β†’ 28% β†’ 39%)
  • Bottom-left: Training loss (should decrease steadily)
  • Bottom-right: Accuracy vs. sparsity trade-off curve

Expected Output:

Final Performance:
  Test Accuracy: 70.5%
  Sparsity: 38.5%

Model Compression:
  Original parameters: 582,346
  Active parameters: 357,868
  Compression ratio: 1.63Γ—

Troubleshooting Colab

Problem: "No GPU detected"

  • Solution: Runtime β†’ Change runtime type β†’ Select GPU β†’ Save

Problem: "Session crashed" or out of memory

  • Solution: Reduce batch_size from 128 to 64 in the configuration

Problem: Training is very slow

  • Solution: Make sure GPU is enabled (should see "Tesla T4" in output)

Problem: Can't find the results plot

  • Solution: Look for runtime_refinement_results.png in the Files panel (πŸ“ icon)

πŸ“ Repository Structure

runtime_refinement/
β”œβ”€β”€ src/                    # C++ proof-of-concept (logistic regression)
β”‚   β”œβ”€β”€ refinement.cpp
β”‚   └── ...
β”œβ”€β”€ python/                 # Python/PyTorch implementations
β”‚   β”œβ”€β”€ cifar10_demo.py    # SimpleCNN on CIFAR-10 (standalone script)
β”‚   └── requirements.txt
β”œβ”€β”€ notebooks/             # Jupyter/Colab notebooks
β”‚   └── runtime_refinement_demo.ipynb  # Interactive Colab demo
β”œβ”€β”€ results/               # Example outputs
β”‚   └── cifar10_results.png
β”œβ”€β”€ CMakeLists.txt         # C++ build configuration
β”œβ”€β”€ LICENSE                # MIT License
└── README.md

πŸ“Œ Local Installation (Python/PyTorch)

If you prefer to run locally instead of using Colab:

Prerequisites

  • Python 3.7+
  • PyTorch 1.12+
  • CUDA-capable GPU (optional but recommended)

Installation

# Clone the repository
git clone https://github.com/infinityabundance/runtime_refinement.git
cd runtime_refinement

# Install dependencies
cd python
pip install -r requirements.txt

# Or install manually
pip install torch torchvision matplotlib numpy

Run the Demo

cd python
python cifar10_demo.py --epochs 20 --refinement-interval 5

Command-Line Options

python cifar10_demo.py --help

Key options:

  • --epochs: Total training epochs (default: 20)
  • --refinement-interval: Epochs between refinement cycles (default: 5)
  • --prune-ratio: Fraction of weights to prune per cycle (default: 0.15)
  • --subset-size: Training samples to use (default: 10000, set to 0 for full dataset)
  • --save-model: Save final model checkpoint
  • --output-dir: Directory for results (default: results/)

Example Commands

# Quick test (default settings)
python cifar10_demo.py

# Longer training with full dataset
python cifar10_demo.py --epochs 50 --subset-size 0

# More aggressive pruning
python cifar10_demo.py --prune-ratio 0.2 --refinement-interval 10

# Save the trained model
python cifar10_demo.py --save-model --output-dir my_results/

πŸ‘¨β€πŸ’» Usage Example

from runtime_refinement import RuntimeRefinement

# Initialize refinement system
model = SimpleCNN()
refiner = RuntimeRefinement(model, device='cuda')

# Training loop with periodic refinement
for epoch in range(num_epochs):
    train_one_epoch(model, train_loader)
    
    # Execute refinement cycle every N epochs
    if epoch % refinement_interval == 0:
        metrics = refiner.refinement_cycle(
            replay_loader,
            prune_ratio=0.15
        )
        print(f"Sparsity: {metrics['sparsity']:.1%}")

πŸ“Ÿ C++ Implementation

A minimal C++ reference implementation is also provided for the core concepts:

mkdir build && cd build
cmake ..
make
./runtime_refinement_demo

This demonstrates the refinement cycle on a toy logistic regression model.


πŸ“Š Extending the Code

The implementation is designed to be modular. You can:

Try Different Architectures

Replace SimpleCNN with ResNet, Transformers, or custom models:

from torchvision.models import resnet18
model = resnet18(num_classes=10)
refiner = RuntimeRefinement(model, device='cuda')

Custom Importance Metrics

Extend estimate_importance() with gradient-based or activation-based signals:

def estimate_importance(self):
    importance = {}
    for name, param in self.model.named_parameters():
        # Custom importance metric
        importance[name] = param.grad.abs().mean() if param.grad is not None else 0
    return importance

Different Pruning Strategies

Implement structured pruning, layer-wise pruning, or custom criteria:

# Structured pruning example
def prune_structured(self, prune_ratio=0.2):
    # Prune entire filters instead of individual weights
    # Implementation here...

Advanced Replay

Add generative replay, prioritized sampling, or uncertainty-based selection:

# Priority-based replay buffer
replay_buffer = PrioritizedReplayBuffer(capacity=1000)
replay_buffer.add(examples, priorities=uncertainty_scores)

πŸ”§ Configuration

Key hyperparameters and their typical ranges:

Parameter Default Range Description
--epochs 20 10-100 Total training epochs
--refinement-interval 5 3-10 Epochs between refinement cycles
--prune-ratio 0.15 0.10-0.30 Fraction of weights to prune per cycle
--replay-buffer-size 1000 500-5000 Number of examples in replay buffer
--replay-steps 30 10-100 Optimization steps during replay
--downscale-factor 0.98 0.95-0.99 Global downscaling multiplier

πŸ“‡ Citation

If you use this code in your research, please cite:

@misc{debeer2026runtime,
  title={Scheduled Runtime Refinement for Neural Networks: 
         Periodic Pruning, Replay, and Distillation as an Operating Regime},
  author={de Beer, Riaan},
  year={2026},
  publisher={Zenodo},
  doi={10.5281/zenodo.18363662},
  url={https://doi.org/10.5281/zenodo.18363662}
}

πŸ“œ Requirements

Python

  • Python 3.7+
  • PyTorch 1.12+
  • torchvision 0.13+
  • matplotlib 3.5+
  • numpy 1.21+

C++

  • C++17 compiler
  • CMake 3.10+

πŸͺͺ License

MIT License - see LICENSE for details.


🌱 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Areas for Contribution

  • Additional pruning strategies (structured, layer-wise, etc.)
  • Different importance estimation methods
  • Integration with other frameworks (TensorFlow, JAX)
  • Benchmarks on other datasets/architectures
  • Deployment optimizations

⚠️ Troubleshooting

Common Issues

ImportError: No module named 'torch'

pip install torch torchvision

CUDA out of memory

# Reduce batch size
python cifar10_demo.py --batch-size 64

Training is slow

# Make sure GPU is being used
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Results look different from paper

  • Make sure you're using the same random seed (--seed 42)
  • Check that GPU is enabled
  • Verify configuration matches paper settings

🎬 Empirical Demonstration (CIFAR-10)

The figure below shows a small-scale empirical demonstration of the scheduled runtime refinement regime described in the accompanying paper. The experiment is designed as a proof of concept, not as a benchmark.

Runtime refinement results

What this experiment demonstrates

  • A simple convolutional network (β‰ˆ582k parameters) is trained on a 10k/10k CIFAR-10 split for 20 epochs.
  • A refinement cycle is executed every 5 epochs, consisting of:
    • magnitude-based pruning, and
    • lightweight replay from recent high-loss examples.
  • Refinement is performed during training to simulate scheduled runtime consolidation.

Interpretation of the figure

  • Training and test accuracy (top-left):
    Test accuracy may dip briefly immediately after pruning events, but consistently recovers and improves following replay, indicating that scheduled pruning does not permanently degrade performance.

  • Sparsity evolution (top-right):
    Global sparsity increases in a stepwise manner at refinement points (0% β†’ 15.0% β†’ 27.7% β†’ 38.5%), illustrating controlled, scheduled compression rather than one-shot pruning.

  • Training loss (bottom-left):
    Loss decreases smoothly throughout training, indicating stable optimization despite repeated pruning and consolidation.

  • Accuracy vs. sparsity (bottom-right):
    Later, more sparse model states achieve comparable or higher test accuracy than earlier dense states, illustrating increased performance density (effective accuracy per active parameter).

Refinement Progression

Refinement Cycle Epoch Sparsity (%) Test Accuracy (%)
Dense baseline 5 0.0 59.24
Cycle 1 10 15.0 67.29
Cycle 2 15 27.7 70.35
Cycle 3 20 38.5 70.91

Final outcome

  • Final test accuracy: 70.9%
  • Final sparsity: 38.5%
  • Active parameters: 357,868 / 582,346
  • Compression ratio: 1.63Γ—
  • Hardware: NVIDIA Tesla T4
  • Total runtime: ~1.8 minutes

This experiment demonstrates that scheduled pruning combined with targeted replay can remove a substantial fraction of parameters while maintaining accuracy under a bounded refinement overhead.

Experimental context:

  • Dataset: CIFAR-10 (10k training / 10k test samples)
  • Model: SimpleCNN (~582k parameters)
  • Refinement schedule: every 5 epochs
  • Final result: 70.9% test accuracy at 38.5% sparsity (1.63Γ— compression)
  • Hardware: NVIDIA Tesla T4 GPU
  • Total runtime: ~1.8 minutes

This experiment is intended as a proof of concept, not a benchmark. It demonstrates that scheduled pruning combined with lightweight replay can reduce active parameter mass while maintaining accuracy under bounded overhead.


❓ FAQ

Q: Can this work with pre-trained models?
A: Yes! Load your pre-trained model and apply refinement during fine-tuning or deployment.

Q: Does this work with transformers/LLMs?
A: The concept applies, but you may need to adjust pruning ratios and replay strategies for very large models.

Q: How much speedup can I expect?
A: Speedup depends on hardware support for sparse operations. On GPUs with sparse acceleration, expect 1.3-2Γ— speedup at 40% sparsity.

Q: Will this hurt accuracy?
A: Small accuracy drops (1-3%) are typical. The key is finding the right prune_ratio for your use case.

Q: Can I use this during training or only after?
A: Both! You can integrate refinement into training (as shown) or apply it to already-trained models.


βœ‰οΈ Contact


πŸ“ Acknowledgments

This work builds on research in neural network pruning, continual learning, and knowledge distillation. See the paper for detailed references.


πŸ’Ό Related Work and Background

This repository accompanies the paper Scheduled Runtime Refinement for Neural Networks and provides a concrete, executable demonstration of the proposed ideas.

Some of the broader conceptual motivation for runtime consolidation, replay, and compression is explored at book length in:

Riaan de Beer,
Sleep for AI: Compression, Consolidation, and the Relentless Acceleration of Intelligence
ISBN: 979-8244482706
Amazon: https://www.amazon.com/dp/B0GHSTTTDR

The book provides a high-level, systems-oriented perspective, while this repository focuses on a minimal, reproducible implementation.


πŸ—‚οΈ Changelog

Version 1.2 (January 2026)

  • Added empirical validation (Section 6.8)
  • Added Google Colab notebook with extensive documentation
  • Added standalone Python implementation with CLI
  • Improved documentation and examples

Version 1.1 (January 2026)

  • Initial public release
  • C++ proof-of-concept implementation
  • Paper published on Zenodo

⭐ If you find this useful, please star the repository!