在评估环节可能存在一些逻辑疏漏

我是运行的./GPG/open-r1/train.sh

训练过程没有问题， 但是评估时（设置 --eval_strategy steps）时，下面这行代码的张量维度无法对齐：

per_token_loss = - per_token_logps * advantages.unsqueeze(1)

我debug后发现，问题来自于_generate_and_score_completions这个函数的下面这段代码（命名为片段-1）：

  if n_valid_samples < self.args.min_inverse_alpha * num_samples:
      logger.info(f"keep generating more examples: the {n_gen}-th mini-batch")
      n_gen += 1
  
  else:
      # 重新组装样本batch
      rewards = merge(identical_rewards, new_rewards)[:len(prompts)]
      print(
          f"[DEBUG][RANK {self.accelerator.process_index}] lin999 {mode} rewards.shape:{rewards.shape},len(prompts):{len(prompts)}")
      prompt_ids = merge_with_padding(identical_prompt_ids, new_prompt_ids, self.processing_class.pad_token_id, left_pad=True)[:len(prompts)]
      prompt_mask = merge_with_padding(identical_prompt_mask, new_prompt_mask, 0, left_pad=True)[:len(prompts)]
      completion_ids = merge_with_padding(identical_completion_ids, new_completion_ids, self.processing_class.pad_token_id, left_pad=False)[:len(prompts)]
      completion_mask = merge_with_padding(identical_completion_mask, new_completion_mask, 0, left_pad=False)[:len(prompts)]
      break
  
在第一个evaluate step时会执行else分支，可以正常运行。 但是在第二个evaluate step时会执行if分支，那么rewards的维度就和第一次不一样了。


例如 我用4卡计算时，超参数如下：

[INFO|trainer.py:2414] 2025-07-01 11:55:15,632 >> ***** Running training *****
[INFO|trainer.py:2417] 2025-07-01 11:55:15,633 >>   Instantaneous batch size per device = 16
[INFO|trainer.py:2420] 2025-07-01 11:55:15,633 >>   Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|trainer.py:2421] 2025-07-01 11:55:15,633 >>   Gradient Accumulation steps = 2


那么第一个evaluate step执行完代码 片段-1 后， rewards维度是 [16]， 但是第二个evaluate step执行完代码 片段-1 后，rewards维度是 [64]，
就会导致在计算 per_token_loss = - per_token_logps * advantages.unsqueeze(1) 时候，维度不一致而报错，因为per_token_logps 的第一个维度一直是[16]




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在评估环节可能存在一些逻辑疏漏 #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

在评估环节可能存在一些逻辑疏漏 #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions