Skip to content

How to stop text generation after obtaining the answer #16

@BennyTMT

Description

@BennyTMT

I tried "GRPO_From_Scratch"—and learned a lot, Thanks!

A small issue: During training/inference, even after Qwen1.5 has reached the answer, it continues generating text.

. . . <answer>66</answer>Human: In a classroom there are 30 students who all need individual attention from the teacher due to special needs. The school has two types of chairs available - standard . . . 

I tried this during training/inference on math tasks, and it usually had no impact. But for some tasks, it might affect reward calculation.
Have anyone considered how to prevent this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions