feat: implement structured outputs for post-processing providers#706
feat: implement structured outputs for post-processing providers#706cjpais merged 7 commits intocjpais:mainfrom
Conversation
…and Apple Intelligence
- Add structured output support with JSON schema in llm_client.rs
- Update actions.rs to use system prompt + user content approach
- Remove legacy ${output} variable substitution for supported providers
- Update Apple Intelligence Swift code to accept system prompts
- All providers now use consistent structured output format
- Remove duplicate check_apple_intelligence_availability function
|
It can be easily tested by setting the provider to either Cerebras, OpenRouter, OpenAI, or Apple Intelligence. Most of these providers now offer models that support structured output, so simply running the model with one of these providers should be a sufficient test. The reason for adding this was that many times, when using cheap LLMs, it caused issues because 10-20% of the time, it failed to return the correct format. It’s much better now. |
|
Thanks this is good, I will take a look and merge when I get a chance |
There was a problem hiding this comment.
nice work on this! the architectural shift to system prompt + user content is solid and the separation is actually better for prompt injection resistance too. went through the code and have some feedback below~
settings migration for existing users
supports_structured_output is added with #[serde(default)] which means existing users who upgrade will have all providers deserialized with false. ensure_post_process_defaults() checks if a provider exists by ID but does NOT sync new fields onto existing entries. so the feature will be dead on arrival for anyone who isn't a fresh install. think we need a migration step that updates the field on existing providers from the defaults
per-provider vs per-model structured output support
marking OpenRouter and Groq as supports_structured_output: true at the provider level is risky. OpenRouter's docs say only select models support json_schema. Groq's docs limit it to specific models (llama 4 variants, kimi k2). users who pick an unsupported model will hit API errors. could either make this per-model, or (simpler) add a fallback to legacy mode when the structured output call fails
fallback on API error
when the structured output API call fails (~line 180 in actions.rs), it returns None and the user gets raw unprocessed text. would be more resilient to fall back to the legacy ${output} substitution path so the user still gets post-processing even if structured outputs aren't supported for their specific model.
ordering note
PR #704 also modifies actions.rs with new template variables (${current_app}, ${time_local}, etc). those will need to work in both the legacy path and the structured output system prompt path. whichever merges second will need to account for this. happy to help coordinate
| /// Build a system prompt from the user's prompt template. | ||
| /// Replaces `${output}` with a reference to the user message so the prompt | ||
| /// still reads sensibly when the transcription is sent separately. | ||
| fn build_system_prompt(prompt_template: &str) -> String { |
There was a problem hiding this comment.
replacing ${output} with "the user's message" makes the system prompt read a bit oddly, especially with the default template which ends with Transcript:\nthe user's message. since the transcription arrives as a separate user message, the system prompt just needs the instructions. would .replace("${output}", "").trim() be cleaner here? then the system prompt is purely instructional and the LLM sees the transcription as the user message naturally.
| base_url: "https://openrouter.ai/api/v1".to_string(), | ||
| allow_base_url_edit: false, | ||
| models_endpoint: Some("/models".to_string()), | ||
| supports_structured_output: true, |
There was a problem hiding this comment.
not all OpenRouter models support json_schema response format. the docs note it's per-model, and some providers behind OpenRouter may fall back to json_object mode or reject the parameter entirely. same concern applies to Groq (line 437) — only specific models support structured output per their docs.
| match crate::llm_client::send_chat_completion_with_schema( | ||
| &provider, | ||
| api_key, | ||
| &model, |
There was a problem hiding this comment.
this returns None on API failure, which means the user gets raw unprocessed text. since the most likely failure mode is the model not supporting response_format, falling back to the legacy ${output} substitution path would give users post-processing even when structured outputs aren't available for their model. something like:
Err(e) => {
warn!("Structured output failed for '{}': {}. Falling back to legacy mode.", provider.id, e);
// Fall through to legacy path below
}There was a problem hiding this comment.
im gonna send examples of this if possible and steps to repro
There was a problem hiding this comment.
no need... i get the point, so i will apply the fallback mechanism
- Add settings migration to sync supports_structured_output field for existing providers - Fix fallback behavior: structured output failures now fall through to legacy mode - Clone api_key to prevent ownership issues in fallback path - Clean up build_system_prompt(): remove placeholder entirely (instead of replacing with 'the user's message' which reads awkwardly) - Add warn import from log crate
- Optimize settings migration: use single match instead of double iteration - Add TRANSCRIPTION_FIELD constant to replace magic strings - Keep Apple Intelligence behavior unchanged (no API fallback for privacy) Addresses code review feedback on PR cjpais#706: 1. More efficient provider lookup in ensure_post_process_defaults() 2. Eliminates hardcoded 'transcription' string in JSON parsing 3. Maintains privacy-first approach for Apple Intelligence
|
I suspect this will cause some issues, but since it's alpha we will deal with it when it happens. |
Resolve merge conflicts with main's race condition fix (cjpais#824) and structured outputs (cjpais#706), keeping both FinishGuard/TranscriptionCoordinator and the new streaming VAD modes.
…ais#706) * feat: implement structured outputs for Cerebras, OpenRouter, OpenAI, and Apple Intelligence - Add structured output support with JSON schema in llm_client.rs - Update actions.rs to use system prompt + user content approach - Remove legacy ${output} variable substitution for supported providers - Update Apple Intelligence Swift code to accept system prompts - All providers now use consistent structured output format - Remove duplicate check_apple_intelligence_availability function * wip changes * fix(structured-outputs): address PR cjpais#706 review comments - Add settings migration to sync supports_structured_output field for existing providers - Fix fallback behavior: structured output failures now fall through to legacy mode - Clone api_key to prevent ownership issues in fallback path - Clean up build_system_prompt(): remove placeholder entirely (instead of replacing with 'the user's message' which reads awkwardly) - Add warn import from log crate * refactor(structured-outputs): apply best practice improvements - Optimize settings migration: use single match instead of double iteration - Add TRANSCRIPTION_FIELD constant to replace magic strings - Keep Apple Intelligence behavior unchanged (no API fallback for privacy) Addresses code review feedback on PR cjpais#706: 1. More efficient provider lookup in ensure_post_process_defaults() 2. Eliminates hardcoded 'transcription' string in JSON parsing 3. Maintains privacy-first approach for Apple Intelligence * fix groq output --------- Co-authored-by: CJ Pais <cj@cjpais.com>
Summary
This PR implements structured outputs support for Cerebras, OpenRouter, OpenAI, and Apple Intelligence providers in the post-processing pipeline.
Changes
Core Implementation
Technical Details
send_chat_completion_with_schema()inllm_client.rsfor structured output supportactions.rsto detect supported providers and use structured outputsBenefits
Testing
Notes
CleanedTranscriptstruct