fix(responses): drop reasoning_effort in fall back branch#5609
fix(responses): drop reasoning_effort in fall back branch#5609asimurka wants to merge 1 commit intoogx-ai:mainfrom
Conversation
| ) | ||
| completion_result = await self.inference_api.openai_chat_completion(params) | ||
| completion_result = await self.inference_api.openai_chat_completion( | ||
| params.model_copy(deep=True, update={"reasoning_effort": None}) |
There was a problem hiding this comment.
nit: line 563 does not use deep copy, though harmless here in this context, it is inconsistent.
| @@ -567,7 +567,9 @@ async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]: | |||
There was a problem hiding this comment.
nit : pre-existing, but logger.critical on line 566 is used for a successful fallback. logger.warning fits better since the system recovers automatically. not blocking for this PR.
2af46b8 to
6d81a10
Compare
|
@skamenan7 addressed your comments. However, tests are not passing. I suspect that the issue may be related to request hashing (my fix modifies the request) because it results in |
I dug into the CI failure a bit more. The Ollama For this PR, I think the main thing is to keep the fallback change targeted. Dropping |
|
FYI, I created this bug - #5636 |
3a98995 to
f0bd8e6
Compare
f0bd8e6 to
15ab2f6
Compare
|
@skamenan7 Thanks for help. I managed to find a fix with a temporary workaround for |
What does this PR do?
When the Responses path tries
openai_chat_completions_with_reasoningand the provider rejects reasoning, it falls back toopenai_chat_completion. The fallback reused the sameparams, soreasoning_effortwas still sent and the plain chat call failed with the same “unsupported reasoning” style error instead of succeeding.This change clears
reasoning_efforton that fallback only, so the non-reasoning chat completion is a valid plain request.Closes #5607
Test Plan
Request:
Response:
{ "background": false, "created_at": 1776950444, "completed_at": 1776950445, "error": null, "frequency_penalty": 0.0, "id": "resp_f892dba8-4f0a-452a-aa2f-90f5d73d97d7", "incomplete_details": null, "model": "openai/gpt-4o-mini", "object": "response", "output": [ { "content": [ { "text": "Hello!", "type": "output_text", "annotations": [], "logprobs": [] } ], "role": "assistant", "type": "message", "id": "msg_e30172d5-5007-4298-8b0e-28aeac2def73", "status": "completed" } ], "parallel_tool_calls": true, "previous_response_id": null, "prompt_cache_key": null, "prompt": null, "status": "completed", "temperature": 1.0, "text": { "format": { "type": "text" } }, "top_p": 1.0, "top_logprobs": 0, "tools": [], "tool_choice": "auto", "truncation": "disabled", "usage": { "input_tokens": 14, "output_tokens": 2, "total_tokens": 16, "input_tokens_details": { "cached_tokens": 0 }, "output_tokens_details": { "reasoning_tokens": 0 } }, "instructions": null, "max_tool_calls": null, "reasoning": { "effort": "low", "summary": null }, "max_output_tokens": null, "safety_identifier": null, "service_tier": "default", "metadata": null, "presence_penalty": 0.0, "store": true }