-
Notifications
You must be signed in to change notification settings - Fork 2
Data Utilization Efficiency in Alternate Training #3
Copy link
Copy link
Open
Description
I've been analyzing the alternate training implementation and noticed a potential data utilization efficiency issue. In the current design, each training sample contains both task-specific data and parallel sentence pairs, but only 50% of the data is utilized in each training phase due to the alternating mechanism. Will this lead to insufficient utilization of training data?
{
"text_parallel_0": "Dydy Tom ddim yn hoffi astudio.",
"text_parallel_1": "Tom doesn't like studying.",
"prefix": "<system_prompt><user_input>",
"text": "<assistant_response>"
}
-
SFT Phase (only_train_language_modeling=True): Uses prefix + text, ignores text_parallel_0/1
-
Contrastive Phase (only_train_contrastive=True): Uses text_parallel_0/1, ignores prefix + text
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels