-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Hi team, thanks for open sourcing DFlash!
I have a question about the dataset generation pipeline for training: What settings do you use for temperature and top_p when regenerating responses from target/reference models for annotation or synthetic data creation? Is there an official recommended value?
Are there best practices or trade-offs to consider for sampling parameters, to ensure the generated data distribution is well-aligned with the supervised training objective?
(From the code and docs, I see temperature=0.0 for inference/benchmark, but could not find details for the training set generation stage. If this has already been discussed/documented, please kindly point me to the relevant resource!)
Thanks in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels