Skip to content

fix(responses): drop reasoning_effort in fall back branch#5609

Open
asimurka wants to merge 1 commit intoogx-ai:mainfrom
asimurka:fix_reasoning_effort_drop_on_fallback
Open

fix(responses): drop reasoning_effort in fall back branch#5609
asimurka wants to merge 1 commit intoogx-ai:mainfrom
asimurka:fix_reasoning_effort_drop_on_fallback

Conversation

@asimurka
Copy link
Copy Markdown
Contributor

What does this PR do?

When the Responses path tries openai_chat_completions_with_reasoning and the provider rejects reasoning, it falls back to openai_chat_completion. The fallback reused the same params, so reasoning_effort was still sent and the plain chat call failed with the same “unsupported reasoning” style error instead of succeeding.

This change clears reasoning_effort on that fallback only, so the non-reasoning chat completion is a valid plain request.

Closes #5607

Test Plan

Request:

curl -sS -X POST "http://localhost:8321/v1/responses" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Say hello in one short sentence.",
    "model": "openai/gpt-4o-mini",
    "stream": false,
    "reasoning": {
      "effort": "low"
    }
  }'

Response:

{
  "background": false,
  "created_at": 1776950444,
  "completed_at": 1776950445,
  "error": null,
  "frequency_penalty": 0.0,
  "id": "resp_f892dba8-4f0a-452a-aa2f-90f5d73d97d7",
  "incomplete_details": null,
  "model": "openai/gpt-4o-mini",
  "object": "response",
  "output": [
    {
      "content": [
        {
          "text": "Hello!",
          "type": "output_text",
          "annotations": [],
          "logprobs": []
        }
      ],
      "role": "assistant",
      "type": "message",
      "id": "msg_e30172d5-5007-4298-8b0e-28aeac2def73",
      "status": "completed"
    }
  ],
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "prompt_cache_key": null,
  "prompt": null,
  "status": "completed",
  "temperature": 1.0,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_p": 1.0,
  "top_logprobs": 0,
  "tools": [],
  "tool_choice": "auto",
  "truncation": "disabled",
  "usage": {
    "input_tokens": 14,
    "output_tokens": 2,
    "total_tokens": 16,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "instructions": null,
  "max_tool_calls": null,
  "reasoning": {
    "effort": "low",
    "summary": null
  },
  "max_output_tokens": null,
  "safety_identifier": null,
  "service_tier": "default",
  "metadata": null,
  "presence_penalty": 0.0,
  "store": true
}

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@skamenan7 skamenan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

)
completion_result = await self.inference_api.openai_chat_completion(params)
completion_result = await self.inference_api.openai_chat_completion(
params.model_copy(deep=True, update={"reasoning_effort": None})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line 563 does not use deep copy, though harmless here in this context, it is inconsistent.

@@ -567,7 +567,9 @@ async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : pre-existing, but logger.critical on line 566 is used for a successful fallback. logger.warning fits better since the system recovers automatically. not blocking for this PR.

@asimurka asimurka force-pushed the fix_reasoning_effort_drop_on_fallback branch from 2af46b8 to 6d81a10 Compare April 24, 2026 06:21
@asimurka
Copy link
Copy Markdown
Contributor Author

asimurka commented Apr 24, 2026

@skamenan7 addressed your comments. However, tests are not passing. I suspect that the issue may be related to request hashing (my fix modifies the request) because it results in RuntimeError (testing/api_recorder.py:1194). Any chance someone more experienced with your test suite can take a look, please?

@skamenan7
Copy link
Copy Markdown
Contributor

@skamenan7 addressed your comments. However, tests are not passing. I suspect that the issue may be related to request hashing (my fix modifies the request) because it results in RuntimeError (testing/api_recorder.py:1194). Any chance someone more experienced with your test suite can take a look, please?

I dug into the CI failure a bit more. The Ollama AttributeError: 'dict' object has no attribute 'content' looks like an existing provider issue that this PR exposed, not something you introduced. I’ll track that separately.

For this PR, I think the main thing is to keep the fallback change targeted. Dropping reasoning_effort fixes the gpt- 4o-mini fallback case, but doing it for every fallback also changes replay requests for reasoning-capable models like o4-mini. Could you make the stripping conditional, then update only the recordings for request bodies that are intentionally changed?

@skamenan7
Copy link
Copy Markdown
Contributor

FYI, I created this bug - #5636

@asimurka asimurka force-pushed the fix_reasoning_effort_drop_on_fallback branch 3 times, most recently from 3a98995 to f0bd8e6 Compare April 28, 2026 08:32
@asimurka asimurka force-pushed the fix_reasoning_effort_drop_on_fallback branch from f0bd8e6 to 15ab2f6 Compare April 28, 2026 08:39
@asimurka
Copy link
Copy Markdown
Contributor Author

@skamenan7 Thanks for help. I managed to find a fix with a temporary workaround for ollama. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fall back inference does not drop reasoning_effort attribute

2 participants