-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Discussion: Integration Strategy for OpenAI Structured Outputs
Context
OpenAI's Structured Outputs feature provides native JSON Schema validation at the model level. This guarantees that model output adheres to a specified schema without requiring post-generation validation.
Key features:
- Schema adherence guaranteed by the model (not post-validation)
- Explicit refusals via
refusalfield when model cannot comply - Requires
strict: trueand specific constraints (additionalProperties: false, all fields required) - Only available on
gpt-4o-mini,gpt-4o-2024-08-06and later - Works with OpenAI-compatible APIs that implement the feature
Alignment with Ordis Principles
Positive alignment:
- ✅ Schema-first approach (schema defines correctness before model execution)
- ✅ Deterministic output (valid schema match or explicit failure)
- ✅ No silent corrections (model must follow schema exactly)
- ✅ Clear error handling (refusal field for programmatic detection)
Potential concerns:
⚠️ Provider-specific feature (not available across all OpenAI-compatible APIs)⚠️ Model-specific constraints (only certain snapshots support it)⚠️ Schema restrictions (all fields required,additionalProperties: false, etc.)
Discussion Points
1. Cross-Provider Consistency
Current approach: Ordis validates all LLM output using its own validation layer, ensuring consistent behavior regardless of provider (OpenAI, Ollama, LM Studio, OpenRouter, etc.)
Question: Should Ordis maintain this universal validation approach, or detect and use provider-native validation when available?
Options:
- A. Consistency-first: Always use Ordis validation, never rely on provider features
- B. Hybrid: Detect Structured Outputs support and use it as an optimization, but keep validation as safety net
- C. Opt-in: Allow users to explicitly enable provider-native features via schema flag
2. Schema Translation
If we support Structured Outputs, we need to handle schema conversion:
Current Ordis schemas:
{
"fields": [
{ "name": "invoice_number", "type": "string", "required": true },
{ "name": "notes", "type": "string", "required": false }
]
}OpenAI Structured Outputs requires:
{
"type": "object",
"properties": {
"invoice_number": { "type": "string" },
"notes": { "type": ["string", "null"] }
},
"required": ["invoice_number", "notes"],
"additionalProperties": false
}Question: Should Ordis automatically convert between formats, or require users to provide OpenAI-compatible schemas when they want to use the feature?
3. Error Handling Consistency
With Structured Outputs:
- Model returns explicit
refusalfield when it cannot comply with schema - Guarantees no schema violations (model won't hallucinate invalid structure)
With Ordis validation:
- Validation errors caught post-generation
- More detailed error messages about what violated the schema
- Consistent error format across all providers
Question: How do we unify these different error modes in the public API?
4. Feature Detection
Implementation considerations:
- How do we detect if a provider supports Structured Outputs?
- What happens when a user's schema uses Ordis-specific features not in OpenAI's JSON Schema subset?
- Should we fail fast or gracefully degrade to validation-based approach?
5. Performance vs. Portability Trade-off
Structured Outputs advantages:
- Potentially faster (no retry loops on validation failures)
- Lower token usage (model doesn't waste tokens on invalid output)
- Simpler prompting (less need for explicit JSON formatting instructions)
Validation-based advantages:
- Works everywhere (any OpenAI-compatible API)
- More flexible schema support (Ordis can add custom validation rules)
- Consistent behavior regardless of provider or model
Question: Is the performance benefit worth the complexity of supporting both paths?
Recommendations
Short-term
Keep current validation-based approach for maximum compatibility and consistency across providers. This maintains Ordis's "boring pipelines" principle.
Long-term considerations
- Document compatibility: Add docs explaining how Ordis schemas map to OpenAI Structured Outputs format
- Optional optimization: Consider adding an opt-in flag like
useNativeValidation: truefor users who know they're using compatible providers - Schema converter: Build a utility to convert Ordis schemas to strict JSON Schema format for users who want to use native features directly
- Provider abstraction: Keep validation as the "universal adapter" that works everywhere, with provider-specific optimizations as optional enhancements
Related
- Issue Add evidence spans to extraction output #54: Evidence spans feature (would need schema format consideration)
- Confidence scoring (validation errors could inform confidence metrics)
- Token budget management (native validation could reduce retries)
Questions for Contributors
- Do you primarily use OpenAI's API, or other providers (Ollama, LM Studio, etc.)?
- Have you experienced validation failures that would have been prevented by native schema enforcement?
- Would you prefer consistent behavior across providers, or maximum performance with provider-specific features?
- Are there Ordis schema features you use that aren't in JSON Schema standard (custom validators, field transforms, etc.)?
Proposed Next Steps
- Gather community feedback on consistency vs. optimization trade-off
- Analyze schema compatibility (what % of Ordis schemas would work with Structured Outputs)
- Prototype hybrid approach to measure complexity cost
- Document schema translation guidelines for users who want to use native features directly