Python library for creating and managing OpenAI Structured Outputs batch API calls.
Makes it easy to extract structured data from large datasets using OpenAI's batch API.
- π― Structured Outputs Only: Built specifically for OpenAI's Structured Outputs API in batch mode
- π§ Schema Fix: Automatically handles the
additionalProperties: falserequirement - a common gotcha when working with Structured Outputs - π Simple batch creation and management
- π° Built-in cost tracking and estimation
- π Progress monitoring and status checking
- π Automatic file handling (input, output, errors)
- π‘οΈ Input validation and error handling
- π Pydantic model support for structured outputs
Structured Outputs is an OpenAI API feature that allows you to extract structured data from text using JSON Schema. This library makes it easy to process large amounts of text in batch mode while automatically handling the schema requirements.
pip install openai-so-batchgit clone https://github.com/ollieglass/openai-so-batch.git
cd openai-so-batch
pip install -e .from pydantic import BaseModel
from openai_so_batch import Batch, Costs
# Define your response model (this will be converted to JSON Schema)
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
# Create a batch
batch = Batch(
input_file="batch-input.jl",
output_file="batch-output.jl",
error_file="batch-errors.jl",
job_name="calendar-extract",
)
# Add tasks to the batch
examples = [
"Alice and Bob are going to a science fair on Friday.",
"Jane booked a meeting with Max and Omar next Tuesday at 2 pm.",
]
for i, sentence in enumerate(examples, 1):
batch.add_task(
id=i,
model="gpt-4o-mini",
system_prompt="Extract the event information.",
user_prompt=sentence,
response_model=CalendarEvent # The library automatically handles schema conversion
)
# Upload the batch
batch.upload()
print(f"Batch ID: {batch.batch_id}")
# Check status and download results
status = batch.get_status()
print(f"Status: {status}")
if status == "completed":
batch.download()from openai_so_batch import Costs
# Calculate costs for a model
costs = Costs(model="gpt-4o-mini")
# Estimate input costs
input_cost = costs.input_cost("batch-input.jl")
print(f"Input cost: ${input_cost:.4f}")
# Calculate actual output costs
output_cost = costs.output_cost("batch-output.jl")
print(f"Output cost: ${output_cost:.4f}")# Retrieve an existing batch by ID
batch = Batch(
input_file=None,
output_file="batch-output.jl",
error_file="batch-errors.jl",
job_name="calendar-extract",
batch_id="batch_6890b93c276c819091452db39758b32a"
)
status = batch.get_status()
print(f"Status: {status}")
if status == "completed":
batch.download()The main class for managing Structured Outputs batch operations.
Batch(
input_file: str,
output_file: str,
error_file: str,
job_name: str,
batch_id: Optional[str] = None
)Parameters:
input_file: Path to the input JSONL fileoutput_file: Path where output will be savederror_file: Path where errors will be savedjob_name: Name identifier for the batch jobbatch_id: Optional batch ID for retrieving existing batches
add_task(id, model, system_prompt, user_prompt, response_model): Add a Structured Outputs task to the batch. Theresponse_modelshould be a Pydantic model that will be converted to JSON Schema withadditionalProperties: falseautomatically applied.upload(): Upload the batch to OpenAIget_status(): Get the current status of the batchdownload(): Download results and errors
Utility class for cost tracking and estimation.
Costs(model: str)Parameters:
model: OpenAI model name (e.g., "gpt-4o-mini", "gpt-4o", "o3")
input_cost(filename): Calculate input token costsoutput_cost(filename): Calculate output token costsinput_tokens(filename): Count input tokensoutput_tokens(filename): Count output tokens
The library supports the following OpenAI models with cost tracking:
gpt-4o-minigpt-4oo3o3-minio4-mini
Make sure you have your OpenAI API key set in your environment:
export OPENAI_API_KEY="your-api-key-here"This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any issues or have questions, please open an issue on GitHub.
- Fixed description on Pypi
- Initial release
- Structured Outputs batch processing functionality
- Automatic handling of
additionalProperties: falseschema requirement - Cost tracking and estimation
- Pydantic model support for structured outputs