Data Utilization Efficiency in Alternate Training

I've been analyzing the alternate training implementation and noticed a potential data utilization efficiency issue. In the current design, each training sample contains both task-specific data and parallel sentence pairs, but only 50% of the data is utilized in each training phase due to the alternating mechanism. Will this lead to insufficient utilization of training data?

```
{
    "text_parallel_0": "Dydy Tom ddim yn hoffi astudio.",
    "text_parallel_1": "Tom doesn't like studying.", 
    "prefix": "<system_prompt><user_input>",
    "text": "<assistant_response>"
}
```
- SFT Phase (only_train_language_modeling=True): Uses prefix + text, ignores text_parallel_0/1

- Contrastive Phase (only_train_contrastive=True): Uses text_parallel_0/1, ignores prefix + text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Utilization Efficiency in Alternate Training #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data Utilization Efficiency in Alternate Training #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions