Skip to content

Implement mini-batch gradient descent for faster training#1

Open
winpat wants to merge 1 commit intomainfrom
claude/optimize-training-speed-yXixY
Open

Implement mini-batch gradient descent for faster training#1
winpat wants to merge 1 commit intomainfrom
claude/optimize-training-speed-yXixY

Conversation

@winpat
Copy link
Owner

@winpat winpat commented Jan 14, 2026

This commit introduces mini-batch training to significantly improve
training performance. Instead of updating weights after each sample,
gradients are accumulated over a batch and applied once per batch.

Changes:

  • Added gradient accumulation to Linear layer

    • New zeroGradients() method to reset gradients at batch start
    • Modified backward() to accumulate gradients instead of overwriting
    • Updated applyGradients() to average gradients by batch size
  • Updated Network training loop

    • Added batch_size parameter to train()
    • Process data in configurable mini-batches
    • Zero gradients at start of each batch
    • Accumulate gradients over batch samples
    • Apply averaged gradients once per batch
  • Updated all train() calls to include batch_size parameter

    • main.zig: Uses batch_size=32 for Iris dataset
    • net.zig test: Uses batch_size=1 for backward compatibility
    • README.md: Updated example with batch_size=32

Benefits:

  • Faster training through reduced weight update overhead
  • More stable gradient estimates from batch averaging
  • Better convergence properties
  • Configurable batch size for different datasets

The default batch_size of 32 provides a good balance between
speed and gradient stability for most datasets.

This commit introduces mini-batch training to significantly improve
training performance. Instead of updating weights after each sample,
gradients are accumulated over a batch and applied once per batch.

Changes:
- Added gradient accumulation to Linear layer
  - New zeroGradients() method to reset gradients at batch start
  - Modified backward() to accumulate gradients instead of overwriting
  - Updated applyGradients() to average gradients by batch size

- Updated Network training loop
  - Added batch_size parameter to train()
  - Process data in configurable mini-batches
  - Zero gradients at start of each batch
  - Accumulate gradients over batch samples
  - Apply averaged gradients once per batch

- Updated all train() calls to include batch_size parameter
  - main.zig: Uses batch_size=32 for Iris dataset
  - net.zig test: Uses batch_size=1 for backward compatibility
  - README.md: Updated example with batch_size=32

Benefits:
- Faster training through reduced weight update overhead
- More stable gradient estimates from batch averaging
- Better convergence properties
- Configurable batch size for different datasets

The default batch_size of 32 provides a good balance between
speed and gradient stability for most datasets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants