How to stop text generation after obtaining the answer

I tried "GRPO_From_Scratch"—and learned a lot, Thanks!

A small issue: During training/inference, even after Qwen1.5 has reached the answer, it continues generating text.
```
. . . <answer>66</answer>Human: In a classroom there are 30 students who all need individual attention from the teacher due to special needs. The school has two types of chairs available - standard . . . 
```
I tried this during training/inference on math tasks, and it usually had no impact. But for some tasks, it might affect reward calculation. 
Have anyone considered how to prevent this? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to stop text generation after obtaining the answer #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to stop text generation after obtaining the answer #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions