Schemic

The Problem

OpenAI's structured output API allows you to specify a JSON schema for the model's response, ensuring predictable output. However, generating a compatible schema from standard Pydantic models presents several challenges:

Strict Requirements: OpenAI imposes specific constraints not typical in standard JSON Schema usage:
- additionalProperties must often be False.
- required fields must be explicitly listed for all properties within an object (with some exceptions for dictionary-like objects).
- default values are disallowed.
- Certain types or keywords like format: date-time or oneOf are forbidden.
- $ref objects cannot have sibling keywords.
- anyOf choices requires unique first keys.
Manual Conversion: Developers often resort to manually creating separate, simplified schemas or writing complex conversion logic to bridge the gap between their application's Pydantic models and OpenAI's requirements.
Workflow Complexity: This leads to duplicated model definitions, increased maintenance overhead, and potential inconsistencies.

Schemic's Solution

Schemic simplifies using Pydantic models with OpenAI's structured output by providing a SchemicModel base class.

Inherit: Define your data structures by inheriting from SchemicModel instead of Pydantic's BaseModel.
Generate: Use the YourModel.schemic_schema() class method. This method automatically:
- Generates the JSON schema from your Pydantic model.
- Recursively modifies the schema to comply with OpenAI's strict requirements (handling additionalProperties, required, default, disallowed keywords, etc.).
- Wraps the schema in the necessary format ({"type": "json_schema", "json_schema": {...}}) for the OpenAI API.
Use: Pass the generated schema directly to the response_format parameter in your OpenAI API call.
Parse: Instantiate your SchemicModel directly from the OpenAI response dictionary (YourModel(**response_dict)), leveraging Pydantic's built-in parsing.

No more manual schema adjustments or duplicate models. Define once, generate a compatible schema, and parse the response easily.

Installation

pip install schemic

Quick Example

# filepath: example.py
import datetime
import json
from typing import List, Optional
from pydantic import Field
from schemic import SchemicModel # Import SchemicModel
import openai
import os
from dotenv import load_dotenv

# Load environment variables (replace with your API key management)
load_dotenv()
client = openai.Client(api_key=os.getenv("OPENAI_API_KEY"))

# 1. Define your models using SchemicModel
class Task(SchemicModel):
    description: str = Field(..., description="Description of the task")
    due_date: Optional[datetime.date] = Field(None, description="Optional due date")
    completed: bool = Field(False, description="Whether the task is completed")

class TodoList(SchemicModel):
    title: str = Field(..., description="Title of the todo list")
    items: List[Task] = Field(..., description="List of tasks")
    created_at: datetime.datetime = Field(
        default_factory=datetime.datetime.now,
        description="Timestamp of creation" # Note: default/default_factory will be handled
    )

# 2. Generate the OpenAI-compatible schema
openai_schema = TodoList.schemic_schema()

# Optional: See the generated schema
# print(json.dumps(openai_schema, indent=2))
# Note how 'default' is removed from 'completed', 'format' might be removed from dates,
# 'additionalProperties' and 'required' are adjusted, and the whole thing is wrapped.

# 3. Use the schema with the OpenAI API
def create_todo_list(prompt: str):
    response = client.chat.completions.create( # Use create for newer versions
        model="gpt-4o", # Or your preferred model supporting structured output
        messages=[
            {"role": "system", "content": "You are an assistant that creates todo lists."},
            {"role": "user", "content": prompt}
        ],
        response_format=openai_schema, # Pass the generated schema
        temperature=0.5,
    )

    # 4. Parse the response directly into your SchemicModel
    response_content = response.choices[0].message.content
    if response_content:
        try:
            content_dict = json.loads(response_content)
            todo_list = TodoList(**content_dict) # Direct parsing!
            return todo_list
        except json.JSONDecodeError:
            print("Error: Failed to decode JSON response")
            print(response_content)
            return None
        except Exception as e: # Catch Pydantic validation errors etc.
             print(f"Error parsing response into model: {e}")
             print(response_content)
             return None
    else:
        print("Error: Empty response content")
        return None


# Example usage
prompt = "Create a todo list for a weekend trip: pack clothes, book hotel, charge phone."
my_list = create_todo_list(prompt)

if my_list:
    print(f"\n--- Todo List: {my_list.title} ---")
    for item in my_list.items:
        print(f"- {item.description} (Completed: {item.completed})")
    # Note: created_at will have the default value assigned by Pydantic during parsing
    print(f"Created At: {my_list.created_at}")

Key Features

SchemicModel: Inherit from this class instead of pydantic.BaseModel.
YourModel.schemic_schema(): Class method that generates the OpenAI-compatible JSON schema.
- Handles additionalProperties, required, default, oneOf -> anyOf, $ref cleanup, format removal, and wrapping automatically.
- Accepts an optional delete_props_with_defaults=True argument to remove fields with defaults entirely, instead of just removing the default keyword.
Standard Pydantic Parsing: Use YourModel(**response_dict) to parse the OpenAI response, benefiting from Pydantic's validation and default value handling on the receiving end.

Why Use Schemic?

Single Source of Truth: Define your data models once using Pydantic and use them for both your application logic and OpenAI interaction.
Automatic Compliance: Automatically handles the nuances and strict requirements of OpenAI's JSON schema format.
Simplified Workflow: Eliminates the need for manual schema conversion, duplicate models, or complex post-processing logic.
Leverage Pydantic: Continue to benefit from Pydantic's robust validation and data parsing features when receiving responses.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Test your changes with python -m unittest discover tests

Known Requirements of the OpenAI API:

This is the running list of openais specific requirements I'm aware of:

Additional properties must be set on all nested objects, and if not having contents must be set to False

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'AISectionResponse': In context=(), 'additionalProperties' is required to be supplied and to be false.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

The “required” keyword must be set on all nested properties (unless they are objects)

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'AISectionResponse': In context=(), 'required' is required to be supplied and to be an array including every key in properties. Missing 'id'.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

defaults are not allowed ever

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'AISectionResponse': In context=('properties', 'type'), 'default' is not permitted.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

datetimes and variables other than basic data types are never allowed

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'MathProblem': In context=('properties', 'created_at'), 'format' is not permitted.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

oneOf is not permitted to be in the schema (handle this by converting it to anyOf)

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'PetOwner': In context=('properties', 'pet'), 'oneOf' is not permitted.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

anyOf must not share identical first keys (including in the required keys array, as well as in the properties dictionary)

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema: Objects provided via 'anyOf' must not share identical first keys. Consider adding a discriminator key or rearranging the properties to ensure the first key is unique.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

REMOVED
you have to wrap the schema in a specific format before sending the model to the ai

{
    "type": "json_schema", // must remain the same
    "json_schema": { // must remain the same
        "strict": True, // must remain the same
        "name": "schemic_model", // this is the name of the model
        "schema": {} // this is the actual schema
    }
}

some models have heightened requirements compared to others... Shouldn't matter however, and schemic should always just produce the highest level of strictness possible. For example, o1 model will throw an error like seen with requirement #6, but 4o will not.
required array doesn't seem to support objects with certain properties. For example, if the object doesn't have a "properties" field (or no "required" field) (dictionaries for example only have additionalProperties).

This makes sense given that you can't require something with an undefined key. So if the object is a dictionary, it doesn't need to be required, but if it is an object, it does need to be required. For example it will support "nested", but not "determined_dir" in the following example:

schema = {
    "type": "json_schema",
    "json_schema": {
        "strict": True,
        "name": "AISectionResponse",
        "schema": { # Added everything before this line (AKA requirement #8)
            "$defs": {
                "NestedObjectR1": {
                    "properties": {
                        "nested_prop": {
                            "title": "Nested Prop",
                            "type": "string"
                        }
                    },
                    "required": [
                        "nested_prop"
                    ],
                    "additionalProperties": False, # Added this line (AKA requirement #1)
                    "title": "NestedObjectR1",
                    "type": "object"
                },
                "NoteSegmentButton": {
                    "properties": {
                        "type": { # This must also be the first one in the dictionary, as it is unique (AKA requirement #6)
                            "enum": ["button"], # Changed this line (AKA requirement #3) --> Removed SegmentType and replaced with enum values
                            "title": "Type", # Changed this line (AKA requirement #3)
                            "type": "string" # Changed this line (AKA requirement #3)
                        },
                        "id": {
                            "title": "Id",
                            "type": "string"
                        },
                        "isDetermined": {
                            "title": "Isdetermined",
                            "type": "boolean"
                        },
                        "placeholder": {
                            "description": "The placeholder text for the button. This is what will be displayed on the button itself.",
                            "title": "Placeholder",
                            "type": "string"
                        },
                        "action": {
                            "description": "The action to take when the button is clicked. This could be a function name, a URL, etc.",
                            "title": "Action",
                            "type": "string"
                        }
                    },
                    "additionalProperties": False, # Added this line (AKA requirement #1)
                    "required": [
                        "type", # Added this line (AKA requirement #2). This must also be the first one in the list, as it is unique (AKA requirement #6)
                        "id", # Added this line (AKA requirement #2)
                        "isDetermined",
                        "placeholder",
                        "action"
                    ],
                    "title": "NoteSegmentButton",
                    "type": "object"
                },
                "NoteSegmentDropdown": {
                    "properties": {
                        "type": { # This must also be the first one in the dictionary, as it is unique (AKA requirement #6)
                            "enum": ["dropdown"], # Changed this line (AKA requirement #3) --> Removed SegmentType and replaced with enum values
                            "title": "Type", # Changed this line (AKA requirement #3)
                            "type": "string" # Changed this line (AKA requirement #3)
                        },
                        "id": {
                            "title": "Id",
                            "type": "string"
                        },
                        "isDetermined": {
                            "title": "Isdetermined",
                            "type": "boolean"
                        },
                        "options": {
                            "items": {
                                "type": "string"
                            },
                            "title": "Options",
                            "type": "array"
                        },
                        "selected": {
                            "items": {
                                "type": "integer"
                            },
                            "title": "Selected",
                            "type": "array"
                        }
                    },
                    "additionalProperties": False, # Added this line (AKA requirement #1)
                    "required": [
                        "type", # Added this line (AKA requirement #2) This must also be the first one in the list, as it is unique (AKA requirement #6)
                        "id", # Added this line (AKA requirement #2)
                        "isDetermined",
                        "options",
                        "selected"
                    ],
                    "title": "NoteSegmentDropdown",
                    "type": "object"
                },
                "NoteSegmentInput": {
                    "properties": {
                        "type": { # This must also be the first one in the dictionary, as it is unique (AKA requirement #6)
                            "enum": ["input"], # Changed this line (AKA requirement #3) --> Removed SegmentType and replaced with enum values
                            "title": "Type", # Changed this line (AKA requirement #3)
                            "type": "string" # Changed this line (AKA requirement #3)
                        },
                        "id": {
                            "title": "Id",
                            "type": "string"
                        },
                        "isDetermined": {
                            "title": "Isdetermined",
                            "type": "boolean"
                        },
                        "value": {
                            "title": "Value",
                            "type": "string"
                        }
                    },
                    "additionalProperties": False, # Added this line (AKA requirement #1)
                    "required": [
                        "type", # Added this line (AKA requirement #2) This must also be the first one in the list, as it is unique (AKA requirement #6)
                        "id", # Added this line (AKA requirement #2)
                        "isDetermined",
                        "value"
                    ],
                    "title": "NoteSegmentInput",
                    "type": "object"
                },
                # SegmentType removed as no longer needed since defaults are not supported by openai, and and we are instead using enum values in the code
            },
            "properties": {
                "top_level_prop": {
                    "title": "Top Level Prop",
                    "type": "string"
                },
                "nested": {
                    "$ref": "#/$defs/NestedObjectR1"
                }, # normal object
                "determined_dir": {
                    "additionalProperties": { # This doesn't need to be false because it has a value... if there was no value we would have to set additionalProperties to false
                        "anyOf": [
                            {
                                "$ref": "#/$defs/NoteSegmentDropdown"
                            },
                            {
                                "$ref": "#/$defs/NoteSegmentInput"
                            },
                            {
                                "$ref": "#/$defs/NoteSegmentButton"
                            }
                        ]
                    },
                    "description": "A dictionary where keys are the placeholder IDs (e.g., 'SOMEUUID1') from the template's content_str, and values are the determined content for that placeholder.",
                    "title": "Determined Dir",
                    "type": "object"
                }, # not a normal object, but a dictionary with a specific structure, with no properties or required fields
                "determined_str": {
                    "description": "The string representing the text for the note section, with '{SOMEUUID1}' representing data from the determined_dir.",
                    "title": "Determined Str",
                    "type": "string"
                }
            },
            "additionalProperties": False, # Added this line (AKA requirement #1)
            "required": [
                "top_level_prop",
                "nested", # THIS IS SUPPORTED
                "determined_dir", # THIS IS NOT SUPPORTED, AND THE LINE MUST BE COMMENTED OUT, DESPITE "nested" BEING SUPPORTED, AND THEY ARE BOTH OBJECTS
                "determined_str",
            ],
            "title": "AISectionResponse",
            "type": "object"
        }
    }
}

Example Error:

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'AISectionResponse': In context=(), 'required' is required to be supplied and to be an array including every key in properties. Extra required key 'determined_dir' supplied.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

$refs do not support additional keys:

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'PetOwner': context=('properties', 'pet_color'), $ref cannot have keywords {'description'}.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'PetOwner': context=('properties', 'pet_color'), $ref cannot have keywords {'type'}.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
examples		examples
schemic		schemic
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Schemic

The Problem

Schemic's Solution

Installation

Quick Example

Key Features

Why Use Schemic?

Contributing

Known Requirements of the OpenAI API:

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ReedGraff/schemic

Folders and files

Latest commit

History

Repository files navigation

Schemic

The Problem

Schemic's Solution

Installation

Quick Example

Key Features

Why Use Schemic?

Contributing

Known Requirements of the OpenAI API:

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages