refactor: clean up reasoning implementation#5429
Open
robinnarsinghranabhat wants to merge 4 commits intoogx-ai:mainfrom
Open
refactor: clean up reasoning implementation#5429robinnarsinghranabhat wants to merge 4 commits intoogx-ai:mainfrom
robinnarsinghranabhat wants to merge 4 commits intoogx-ai:mainfrom
Conversation
…ntations Remove duplicated reasoning logic from vLLM, Ollama, and Bedrock providers into a shared reasoning.py utility. Remove dead _prepare_reasoning_params stubs, simplify return types to streaming-only, eliminate type ignores in streaming.py, and add multi-turn reasoning + tool call integration test. Signed-off-by: robinnarsinghranabhat <robinnarsingha123@gmail.com>
Signed-off-by: robinnarsinghranabhat <robinnarsingha123@gmail.com>
0556918 to
8d69201
Compare
Contributor
|
This pull request has merge conflicts that must be resolved before it can be merged. @robinnarsinghranabhat please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Collaborator
|
@robinnarsinghranabhat what is the status of this one? |
This comment was marked as spam.
This comment was marked as spam.
1 similar comment
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
Contributor
Author
|
@cdoern I might have to rebase. but just awaiting review. But it's mainly effort towards code clean up and get rid of type-ignores for changes introduced in reasoning PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up cleanups to #5206 (reasoning output in responses API) and #5407 (type fixes), addressing items from mattf's review.
Code changes
Extracted shared reasoning util (
src/llama_stack/providers/utils/inference/reasoning.py): All three providers (vLLM, Ollama, Bedrock) had identical message mapping and chunk wrapping code. Pulled it into a sharedreasoning.pyutil with no defaults -- each provider explicitly passes its field names.Simplified return type:
openai_chat_completions_with_reasoningis streaming-only across all providers. RemovedOpenAIChatCompletionWithReasoningfrom return type in the API protocol, router, and all provider implementations.Removed dead code: Deleted no-op
_prepare_reasoning_paramsstubs from vLLM and Bedrock providers. Inlined Ollama's version (2-line default forreasoning_effort).Removed type ignores: Eliminated
# type: ignoreon_separate_tool_callsassignment (reordered if/else to assign base type first) and ontool_call.indexusage in streaming chunk processing.Test changes:
test_reasoning_basic_streamingandtest_reasoning_multi_turn_with_tool_callto theollama-reasoningtest suite.Addressed from review
openai_chat_completions_with_reasoningis streaming only, remove non-streaming return type_prepare_reasoning_paramsfrom vLLM_prepare_reasoning_paramsfrom Bedrock_prepare_reasoning_paramsfor Ollamamessage = OpenAIAssistantMessageParamcase first to resolve type ignorecompletion_resultconfused by storing both CC and CC-with-reasoningTODO ( needs design discussion )
_get_preceding_reasoningonly checksindex - 1, based on what I observed from official openai's response output. Is there a missing Edge case ?criticallog."thinking"). The shared util now supports custom field names, so this is ready to implement.n != 1simplificationmodel_copy()calls_separate_tool_callsTest plan
analysischanneluv run pre-commit run mypy-full --hook-stage manual --all-files# passes