While examining the trajectories, I noticed that the action fields contain a thinking component. Could you clarify whether this affects task evaluation? For example, if the correct answer appears in the thinking section but not in the actual action output, is it still considered a successful completion, or is the evaluation based solely on the action content?