Skip to content

Comments

feat: implement structured outputs for post-processing providers#706

Merged
cjpais merged 7 commits intocjpais:mainfrom
Schreezer:fix/structured-outputs
Feb 17, 2026
Merged

feat: implement structured outputs for post-processing providers#706
cjpais merged 7 commits intocjpais:mainfrom
Schreezer:fix/structured-outputs

Conversation

@Schreezer
Copy link
Contributor

Summary

This PR implements structured outputs support for Cerebras, OpenRouter, OpenAI, and Apple Intelligence providers in the post-processing pipeline.

Changes

Core Implementation

  • Structured Outputs: Added JSON schema support to enforce consistent response format
  • System Prompt Architecture: Changed from `` variable substitution to system prompt + user content approach
  • Provider Support:
    • Cerebras ✅
    • OpenRouter ✅
    • OpenAI ✅
    • Apple Intelligence ✅

Technical Details

  • Added send_chat_completion_with_schema() in llm_client.rs for structured output support
  • Updated actions.rs to detect supported providers and use structured outputs
  • Modified Apple Intelligence Swift code to accept separate system prompts
  • All providers now use consistent format: system instructions + user transcription

Benefits

  • Type Safety: Guaranteed JSON schema adherence
  • Consistent Format: No more parsing errors or hallucinated fields
  • Better Architecture: Clean separation between instructions and content
  • Fallback Support: Graceful degradation if JSON parsing fails

Testing

  • All 23 existing tests pass ✅
  • No compiler warnings ✅
  • Code compiles successfully on macOS with Apple Intelligence support

Notes

  • Legacy providers (Anthropic, Groq, Custom) continue using `` variable substitution
  • Apple Intelligence uses native Swift structured outputs via CleanedTranscript struct

…and Apple Intelligence

- Add structured output support with JSON schema in llm_client.rs
- Update actions.rs to use system prompt + user content approach
- Remove legacy ${output} variable substitution for supported providers
- Update Apple Intelligence Swift code to accept system prompts
- All providers now use consistent structured output format
- Remove duplicate check_apple_intelligence_availability function
@Schreezer
Copy link
Contributor Author

It can be easily tested by setting the provider to either Cerebras, OpenRouter, OpenAI, or Apple Intelligence. Most of these providers now offer models that support structured output, so simply running the model with one of these providers should be a sufficient test. The reason for adding this was that many times, when using cheap LLMs, it caused issues because 10-20% of the time, it failed to return the correct format. It’s much better now.

@cjpais
Copy link
Owner

cjpais commented Feb 4, 2026

Thanks this is good, I will take a look and merge when I get a chance

Copy link
Contributor

@VirenMohindra VirenMohindra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work on this! the architectural shift to system prompt + user content is solid and the separation is actually better for prompt injection resistance too. went through the code and have some feedback below~

settings migration for existing users

supports_structured_output is added with #[serde(default)] which means existing users who upgrade will have all providers deserialized with false. ensure_post_process_defaults() checks if a provider exists by ID but does NOT sync new fields onto existing entries. so the feature will be dead on arrival for anyone who isn't a fresh install. think we need a migration step that updates the field on existing providers from the defaults

per-provider vs per-model structured output support

marking OpenRouter and Groq as supports_structured_output: true at the provider level is risky. OpenRouter's docs say only select models support json_schema. Groq's docs limit it to specific models (llama 4 variants, kimi k2). users who pick an unsupported model will hit API errors. could either make this per-model, or (simpler) add a fallback to legacy mode when the structured output call fails

fallback on API error

when the structured output API call fails (~line 180 in actions.rs), it returns None and the user gets raw unprocessed text. would be more resilient to fall back to the legacy ${output} substitution path so the user still gets post-processing even if structured outputs aren't supported for their specific model.

ordering note

PR #704 also modifies actions.rs with new template variables (${current_app}, ${time_local}, etc). those will need to work in both the legacy path and the structured output system prompt path. whichever merges second will need to account for this. happy to help coordinate

/// Build a system prompt from the user's prompt template.
/// Replaces `${output}` with a reference to the user message so the prompt
/// still reads sensibly when the transcription is sent separately.
fn build_system_prompt(prompt_template: &str) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacing ${output} with "the user's message" makes the system prompt read a bit oddly, especially with the default template which ends with Transcript:\nthe user's message. since the transcription arrives as a separate user message, the system prompt just needs the instructions. would .replace("${output}", "").trim() be cleaner here? then the system prompt is purely instructional and the LLM sees the transcription as the user message naturally.

base_url: "https://openrouter.ai/api/v1".to_string(),
allow_base_url_edit: false,
models_endpoint: Some("/models".to_string()),
supports_structured_output: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all OpenRouter models support json_schema response format. the docs note it's per-model, and some providers behind OpenRouter may fall back to json_object mode or reject the parameter entirely. same concern applies to Groq (line 437) — only specific models support structured output per their docs.

match crate::llm_client::send_chat_completion_with_schema(
&provider,
api_key,
&model,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns None on API failure, which means the user gets raw unprocessed text. since the most likely failure mode is the model not supporting response_format, falling back to the legacy ${output} substitution path would give users post-processing even when structured outputs aren't available for their model. something like:

Err(e) => {
    warn!("Structured output failed for '{}': {}. Falling back to legacy mode.", provider.id, e);
    // Fall through to legacy path below
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im gonna send examples of this if possible and steps to repro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need... i get the point, so i will apply the fallback mechanism

Schreezer and others added 4 commits February 8, 2026 14:55
- Add settings migration to sync supports_structured_output field for existing providers
- Fix fallback behavior: structured output failures now fall through to legacy mode
- Clone api_key to prevent ownership issues in fallback path
- Clean up build_system_prompt(): remove  placeholder entirely
  (instead of replacing with 'the user's message' which reads awkwardly)
- Add warn import from log crate
- Optimize settings migration: use single match instead of double iteration
- Add TRANSCRIPTION_FIELD constant to replace magic strings
- Keep Apple Intelligence behavior unchanged (no API fallback for privacy)

Addresses code review feedback on PR cjpais#706:
1. More efficient provider lookup in ensure_post_process_defaults()
2. Eliminates hardcoded 'transcription' string in JSON parsing
3. Maintains privacy-first approach for Apple Intelligence
@cjpais
Copy link
Owner

cjpais commented Feb 17, 2026

I suspect this will cause some issues, but since it's alpha we will deal with it when it happens.

@cjpais cjpais merged commit 0cb8ab2 into cjpais:main Feb 17, 2026
4 checks passed
mceachen added a commit to mceachen/Handy that referenced this pull request Feb 20, 2026
Resolve merge conflicts with main's race condition fix (cjpais#824) and
structured outputs (cjpais#706), keeping both FinishGuard/TranscriptionCoordinator
and the new streaming VAD modes.
RohanMuppa pushed a commit to RohanMuppa/Handy that referenced this pull request Feb 21, 2026
…ais#706)

* feat: implement structured outputs for Cerebras, OpenRouter, OpenAI, and Apple Intelligence

- Add structured output support with JSON schema in llm_client.rs
- Update actions.rs to use system prompt + user content approach
- Remove legacy ${output} variable substitution for supported providers
- Update Apple Intelligence Swift code to accept system prompts
- All providers now use consistent structured output format
- Remove duplicate check_apple_intelligence_availability function

* wip changes

* fix(structured-outputs): address PR cjpais#706 review comments

- Add settings migration to sync supports_structured_output field for existing providers
- Fix fallback behavior: structured output failures now fall through to legacy mode
- Clone api_key to prevent ownership issues in fallback path
- Clean up build_system_prompt(): remove  placeholder entirely
  (instead of replacing with 'the user's message' which reads awkwardly)
- Add warn import from log crate

* refactor(structured-outputs): apply best practice improvements

- Optimize settings migration: use single match instead of double iteration
- Add TRANSCRIPTION_FIELD constant to replace magic strings
- Keep Apple Intelligence behavior unchanged (no API fallback for privacy)

Addresses code review feedback on PR cjpais#706:
1. More efficient provider lookup in ensure_post_process_defaults()
2. Eliminates hardcoded 'transcription' string in JSON parsing
3. Maintains privacy-first approach for Apple Intelligence

* fix groq output

---------

Co-authored-by: CJ Pais <cj@cjpais.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants