some models have a "thinking" mode that can be toggled on/off- qwen
some support high/med/low- gpt-oss
openai/anthropic conversation types have a 'thinking' content type- check if/where this is supported as an input to the template and render it correctly
do we need to parse model output to construct the typed 'thinking' outputs that the proprietary apis return? do other servers (e.g llamacpp) do this? if so, same logic as #454 applies- if we did not request thinking but model outputs thinking.. what do we do? render as text output?
some models have a "thinking" mode that can be toggled on/off- qwen
some support high/med/low- gpt-oss
openai/anthropic conversation types have a 'thinking' content type- check if/where this is supported as an input to the template and render it correctly
do we need to parse model output to construct the typed 'thinking' outputs that the proprietary apis return? do other servers (e.g llamacpp) do this? if so, same logic as #454 applies- if we did not request thinking but model outputs thinking.. what do we do? render as text output?