Skip to content

在评估环节可能存在一些逻辑疏漏 #20

@HanlardResearch

Description

@HanlardResearch

我是运行的./GPG/open-r1/train.sh

训练过程没有问题, 但是评估时(设置 --eval_strategy steps)时,下面这行代码的张量维度无法对齐:

per_token_loss = - per_token_logps * advantages.unsqueeze(1)

我debug后发现,问题来自于_generate_and_score_completions这个函数的下面这段代码(命名为片段-1):

if n_valid_samples < self.args.min_inverse_alpha * num_samples:
logger.info(f"keep generating more examples: the {n_gen}-th mini-batch")
n_gen += 1

else:
# 重新组装样本batch
rewards = merge(identical_rewards, new_rewards)[:len(prompts)]
print(
f"[DEBUG][RANK {self.accelerator.process_index}] lin999 {mode} rewards.shape:{rewards.shape},len(prompts):{len(prompts)}")
prompt_ids = merge_with_padding(identical_prompt_ids, new_prompt_ids, self.processing_class.pad_token_id, left_pad=True)[:len(prompts)]
prompt_mask = merge_with_padding(identical_prompt_mask, new_prompt_mask, 0, left_pad=True)[:len(prompts)]
completion_ids = merge_with_padding(identical_completion_ids, new_completion_ids, self.processing_class.pad_token_id, left_pad=False)[:len(prompts)]
completion_mask = merge_with_padding(identical_completion_mask, new_completion_mask, 0, left_pad=False)[:len(prompts)]
break

在第一个evaluate step时会执行else分支,可以正常运行。 但是在第二个evaluate step时会执行if分支,那么rewards的维度就和第一次不一样了。

例如 我用4卡计算时,超参数如下:

[INFO|trainer.py:2414] 2025-07-01 11:55:15,632 >> ***** Running training *****
[INFO|trainer.py:2417] 2025-07-01 11:55:15,633 >> Instantaneous batch size per device = 16
[INFO|trainer.py:2420] 2025-07-01 11:55:15,633 >> Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|trainer.py:2421] 2025-07-01 11:55:15,633 >> Gradient Accumulation steps = 2

那么第一个evaluate step执行完代码 片段-1 后, rewards维度是 [16], 但是第二个evaluate step执行完代码 片段-1 后,rewards维度是 [64],
就会导致在计算 per_token_loss = - per_token_logps * advantages.unsqueeze(1) 时候,维度不一致而报错,因为per_token_logps 的第一个维度一直是[16]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions