Skip to content

Data Utilization Efficiency in Alternate Training #3

@Ryuge-Kisaki

Description

@Ryuge-Kisaki

I've been analyzing the alternate training implementation and noticed a potential data utilization efficiency issue. In the current design, each training sample contains both task-specific data and parallel sentence pairs, but only 50% of the data is utilized in each training phase due to the alternating mechanism. Will this lead to insufficient utilization of training data?

{
    "text_parallel_0": "Dydy Tom ddim yn hoffi astudio.",
    "text_parallel_1": "Tom doesn't like studying.", 
    "prefix": "<system_prompt><user_input>",
    "text": "<assistant_response>"
}
  • SFT Phase (only_train_language_modeling=True): Uses prefix + text, ignores text_parallel_0/1

  • Contrastive Phase (only_train_contrastive=True): Uses text_parallel_0/1, ignores prefix + text

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions