Does different GPT API differ in the performance of customerized prompts?

I have tried GPT-4o-mini in the LLM agent but get a sub-optimal decision like the following screenshot. The decision (phase 2) provided by the embedded RL is more reasonable than the one (phase 1) provided by LLM agent. Is this problem caused by the version of API version? Or is it caused by the imcomplete logic judgement in the chain-of-thought or prompt engineering? Which file contains the prompt for me to make some adjustment or modification?
![图片](https://github.com/user-attachments/assets/c4785a5c-4ffb-4c75-a9ae-6d2e2ea93c54)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does different GPT API differ in the performance of customerized prompts? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does different GPT API differ in the performance of customerized prompts? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions