I have tried GPT-4o-mini in the LLM agent but get a sub-optimal decision like the following screenshot. The decision (phase 2) provided by the embedded RL is more reasonable than the one (phase 1) provided by LLM agent. Is this problem caused by the version of API version? Or is it caused by the imcomplete logic judgement in the chain-of-thought or prompt engineering? Which file contains the prompt for me to make some adjustment or modification?
