Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
loss gradient-clipping adaptive-gradient-clipping loss-spike adaptive-clipping zclip stable-llm-pretraining enable-high-learning-rate traning-stability pre-training-stability stable-training llm-stable-training gradient-norm-clipping
-
Updated
May 12, 2025 - Python