Skip to content
/ SCBI Public

SCBI: A GPU-accelerated initialization strategy that replaces random weights with statistical estimates. "Solves" linear layers before training starts, reducing initial loss by 90% on regression tasks.

License

Notifications You must be signed in to change notification settings

fares3010/SCBI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SCBI: Stochastic Covariance-Based Initialization

Python PyTorch License DOI

Official implementation of the paper: "Stochastic Covariance-Based Initialization (SCBI): A Scalable Warm-Start Strategy for High-Dimensional Linear Models"

A novel neural network weight initialization method that achieves 87Γ— faster convergence on regression tasks and 33% lower initial loss on classification tasks. SCBI is a GPU-accelerated initialization method that calculates the optimal starting weights for linear layers using covariance statistics. It effectively "solves" linear regression tasks before the first epoch of training, replacing random initialization with a statistically grounded "warm start."


πŸš€ Key Results

SCBI significantly outperforms standard Random Initialization (He/Xavier) by approximating the closed-form solution via stochastic bagging.

Dataset Task Improvement (Epoch 0)
Synthetic High-Dim Binary Classification 61.6% reduction in loss
California Housing Regression 90.8% reduction in MSE
Forest Cover Type Multi-Class Classification 26.2% reduction in loss

Convergence Graph
Figure 1: Convergence comparison on Synthetic High-Dimensional Data.

Regression Graph
Figure 2: California Housing Regression: SCBI vs Random.


Classification Graph
Figure 3: Multi-Class Classification: SCBI vs Random.


πŸš€ Key Results

Task Metric Standard Init SCBI Init Improvement
Regression Initial MSE 26,000 300 87Γ—
Regression Final MSE 22,000 ~0 >1000Γ—
Classification Initial Loss 1.18 0.79 33%
Classification Final Loss 1.14 0.77 32%

Classification Graph
Figure 4: Performance Results


Classification Graph
Figure 5: Performance Results


Classification Graph
Figure 6: Performance Results


πŸ“– What is SCBI?

SCBI (Stochastic Covariance-Based Initialization) is a data-driven initialization method that computes optimal linear weights by solving the Normal Equation on stochastic subsets of your training data.

Key Innovations

  1. Universal Formulation: Works for regression, binary, multi-class, and multi-label classification
  2. Stochastic Bagging: Prevents overfitting by averaging solutions across random data subsets
  3. Ridge Regularization: Ensures numerical stability even with correlated features
  4. Fast Approximation: Linear-complexity variant for high-dimensional problems (D > 10,000)

Mathematical Foundation

For each stochastic subset:

(X^T X + Ξ»I)^{-1} X^T y

Final weights are obtained by ensemble averaging across all subsets.


πŸ› οΈ Installation

Dependencies are minimal. You only need PyTorch and standard data libraries.

pip install torch numpy scikit-learn matplotlib

Option 1: Direct Download

# Download scbi.py
wget https://github.com/fares3010/SCBI/blob/main/scbi.py

Use in your project

from scbi import SCBIInitializer, scbi_init

🎯 Quick Start

Regression Example

import torch
import torch.nn as nn
from scbi import scbi_init

# Your data
X_train = torch.randn(1000, 50)  # [N, D]
y_train = torch.randn(1000)      # [N]

# Compute SCBI weights
weights, bias = scbi_init(X_train, y_train, n_samples=10)

# Use in your model
model = nn.Linear(50, 1)
with torch.no_grad():
    model.weight.data = weights.T
    model.bias.data = bias

Classification Example

import torch
import torch.nn as nn
from scbi import SCBIInitializer

# Your data (one-hot encoded targets)
X_train = torch.randn(1000, 50)
y_onehot = torch.zeros(1000, 10)
y_onehot.scatter_(1, torch.randint(0, 10, (1000, 1)), 1)

# Initialize with SCBI
model = nn.Linear(50, 10)
initializer = SCBIInitializer(n_samples=15, ridge_alpha=1.5)
initializer.initialize_layer(model, X_train, y_onehot)

High-Dimensional Data (D > 10,000)

from scbi import fast_damping_init

X_train = torch.randn(500, 15000)  # High-dimensional
y_train = torch.randn(500)

# Use fast approximation (O(NΓ—DΒ²) instead of O(NΓ—DΒ³))
weights, bias = fast_damping_init(X_train, y_train)

πŸ”§ API Reference

SCBIInitializer

Main class for SCBI initialization.

SCBIInitializer(
    n_samples=10,        # Number of stochastic subsets
    sample_ratio=0.5,    # Fraction of data per subset
    ridge_alpha=1.0,     # Regularization strength
    verbose=True         # Print progress
)

Methods:

  • compute_weights(X_data, y_data) β†’ Returns (weights, bias)
  • initialize_layer(layer, X_data, y_data) β†’ Initializes nn.Linear layer in-place

scbi_init() - Convenience Function

weights, bias = scbi_init(
    X_data,              # Input features [N, D]
    y_data,              # Targets [N] or [N, Output_Dim]
    n_samples=10,
    sample_ratio=0.5,
    ridge_alpha=1.0,
    verbose=True
)

FastDampingInitializer

Fast approximation for very high-dimensional data.

FastDampingInitializer(eps=1e-8, verbose=True)

Methods:

  • compute_weights(X, y) β†’ Returns (weights, bias)
  • initialize_layer(layer, X, y) β†’ Initializes nn.Linear layer

fast_damping_init() - Convenience Function

weights, bias = fast_damping_init(X_data, y_data, verbose=True)

πŸ“Š Hyperparameter Guide

n_samples (Number of Subsets)

  • Default: 10
  • Range: 5-20
  • Higher: More stable, slower
  • Lower: Faster, more variance

sample_ratio (Data Fraction per Subset)

  • Default: 0.5 (50% of data)
  • Range: 0.3-1.0
  • Higher: Less stochastic, may overfit
  • Lower: More stochastic, more robust

ridge_alpha (Regularization Strength)

  • Default: 1.0
  • Range: 0.5-5.0
  • Higher: More regularization (for noisy/high-dim data)
  • Lower: Less regularization (for clean/low-dim data)

πŸ§ͺ Complete Example

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from scbi import SCBIInitializer

# 1. Generate synthetic data
X, y = make_regression(n_samples=2000, n_features=50, noise=10.0)

# 2. Standardize features (important!)
scaler = StandardScaler()
X = scaler.fit_transform(X)

# 3. Convert to tensors
X_train = torch.tensor(X[:1600], dtype=torch.float32)
y_train = torch.tensor(y[:1600], dtype=torch.float32)

# 4. Create model with SCBI initialization
model = nn.Linear(50, 1)
initializer = SCBIInitializer(n_samples=10, ridge_alpha=1.0)
initializer.initialize_layer(model, X_train, y_train)

# 5. Train normally
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

for epoch in range(5):
    optimizer.zero_grad()
    pred = model(X_train)
    loss = criterion(pred, y_train.unsqueeze(1))
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}: Loss = {loss.item():.4f}")

1. For Regression (Continuous Target)

import torch
import torch.nn as nn
from scbi import compute_scbi_weights # Assuming you save the algorithm in scbi.py

# 1. Define your model
model = nn.Linear(input_dim, 1)

# 2. Calculate SCBI Weights (GPU Accelerated)
# Note: X_train and y_train must be Tensors
print("Calculating Warm Start...")
w_init, b_init = compute_scbi_weights(X_train, y_train, n_samples=10, ridge_alpha=1.0)

# 3. Assign Weights to Model
with torch.no_grad():
    model.weight.data = w_init.T
    model.bias.data = b_init

# 4. Train as normal (Adam/SGD)...
# You will see the loss start near 0.

2. For Classification (Multi-Class)

For classification, ensure your target y is One-Hot Encoded before passing it to SCBI.

from scbi import compute_scbi_classification

# ... Load Data ...

# Calculate Weights for 7 Classes
w_init, b_init = compute_scbi_classification(X_train, y_one_hot, n_samples=10)

with torch.no_grad():
    model.weight.data = w_init.T
    model.bias.data = b_init.squeeze()

πŸ“– Methodology

Standard initialization strategies (Xavier, He) are semantically blindβ€”they initialize weights based on architecture dimensions, ignoring data statistics.

SCBI leverages the Normal Equation approximation:

$$\theta_{SCBI} = \frac{1}{K} \sum_{k=1}^{K} \left[ \left( \tilde{X}_k^T \tilde{X}_k + \lambda I \right)^{-1} \tilde{X}_k^T Y_k \right]$$

By computing this on random subsets (Bagging) using GPU matrix operations, we obtain a robust estimator of the global minimum without the $O(N^3)$ cost of full matrix inversion.


πŸ”¬ When to Use SCBI

βœ… SCBI Works Best For:

  • Small to medium datasets (N < 100,000)
  • First layer initialization in deep networks
  • Linear/weakly non-linear problems
  • Tasks where training data is expensive
  • Fast prototyping and experimentation

⚠️ Consider Alternatives For:

  • Very large datasets (N > 1,000,000) - computational cost may outweigh benefits
  • Highly non-linear problems - SCBI is fundamentally linear
  • Pre-trained models - Transfer learning may be more effective

πŸ“ˆ Performance Benchmarks

Computational Complexity

Method Time Complexity Space Complexity
Xavier/He Init O(D) O(D)
SCBI O(n_samples Γ— N Γ— DΒ²) O(DΒ²)
Fast Damping O(N Γ— DΒ²) O(DΒ²)

Recommended Usage

  • D < 1,000: Use standard SCBI (n_samples=10-20)
  • 1,000 < D < 10,000: Use standard SCBI (n_samples=5-10)
  • D > 10,000: Use Fast Damping approximation

πŸ” How It Works

Step-by-Step Process

  1. Stochastic Sampling: Randomly sample sample_ratio fraction of training data
  2. Augmentation: Add bias column to feature matrix
  3. Covariance Matrix: Compute X^T @ X
  4. Regularization: Add ridge penalty Ξ»I
  5. Correlation: Compute X^T @ y
  6. Solve: Use linear algebra to solve (X^T X + Ξ»I)^{-1} X^T y
  7. Repeat: Do this for n_samples different subsets
  8. Average: Ensemble average all solutions

Why Stochastic?

Averaging across multiple random subsets provides:

  • Robustness: Less sensitive to outliers
  • Regularization: Implicit regularization effect
  • Better generalization: Prevents overfitting to specific data patterns

πŸ“š Citation

If you use SCBI in your research, please cite:

Fares, A. (2026). SCBI: Stochastic Covariance-Based Initialization (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.18576203

πŸ“œ License

MIT License

This project is licensed under the MIT License - see the LICENSE file for details.

About

SCBI: A GPU-accelerated initialization strategy that replaces random weights with statistical estimates. "Solves" linear layers before training starts, reducing initial loss by 90% on regression tasks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages