-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Thanks for great work
I’m having issues training Qwen 8B from scratch using SpecForge as I am getting low acceptance length , and I want to confirm my context-length setup.
My understanding is that the training context length should be longer than the generation context length to account for the prompt + response. Right now, I regenerate data with 2k , train with a 4k context window, and evaluate benchmarks with max_new_tokens = 2k. Is this configuration correct, or should I change the training/eval context lengths?
Also, in initial runs KL loss gave better results than CE. Has anyone seen similar behavior, and are there recommended KL settings (e.g., temperature/logit scaling)?
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels