OpenAI Batch Processing Library

Python library for creating and managing OpenAI Structured Outputs batch API calls.

Makes it easy to extract structured data from large datasets using OpenAI's batch API.

Key Features

🎯 Structured Outputs Only: Built specifically for OpenAI's Structured Outputs API in batch mode
🔧 Schema Fix: Automatically handles the additionalProperties: false requirement - a common gotcha when working with Structured Outputs
🚀 Simple batch creation and management
💰 Built-in cost tracking and estimation
📊 Progress monitoring and status checking
🔄 Automatic file handling (input, output, errors)
🛡️ Input validation and error handling
📝 Pydantic model support for structured outputs

What is Structured Outputs?

Structured Outputs is an OpenAI API feature that allows you to extract structured data from text using JSON Schema. This library makes it easy to process large amounts of text in batch mode while automatically handling the schema requirements.

Installation

From PyPI (recommended)

pip install openai-so-batch

From source

git clone https://github.com/ollieglass/openai-so-batch.git
cd openai-so-batch
pip install -e .

Quick Start

Basic Structured Outputs Usage

from pydantic import BaseModel
from openai_so_batch import Batch, Costs

# Define your response model (this will be converted to JSON Schema)
class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

# Create a batch
batch = Batch(
    input_file="batch-input.jl",
    output_file="batch-output.jl",
    error_file="batch-errors.jl",
    job_name="calendar-extract",
)

# Add tasks to the batch
examples = [
    "Alice and Bob are going to a science fair on Friday.",
    "Jane booked a meeting with Max and Omar next Tuesday at 2 pm.",
]

for i, sentence in enumerate(examples, 1):
    batch.add_task(
        id=i,
        model="gpt-4o-mini",
        system_prompt="Extract the event information.",
        user_prompt=sentence,
        response_model=CalendarEvent  # The library automatically handles schema conversion
    )

# Upload the batch
batch.upload()
print(f"Batch ID: {batch.batch_id}")

# Check status and download results
status = batch.get_status()
print(f"Status: {status}")

if status == "completed":
    batch.download()

Cost Tracking

from openai_so_batch import Costs

# Calculate costs for a model
costs = Costs(model="gpt-4o-mini")

# Estimate input costs
input_cost = costs.input_cost("batch-input.jl")
print(f"Input cost: ${input_cost:.4f}")

# Calculate actual output costs
output_cost = costs.output_cost("batch-output.jl")
print(f"Output cost: ${output_cost:.4f}")

Retrieving Existing Batches

# Retrieve an existing batch by ID
batch = Batch(
    input_file=None,
    output_file="batch-output.jl",
    error_file="batch-errors.jl",
    job_name="calendar-extract",
    batch_id="batch_6890b93c276c819091452db39758b32a"
)

status = batch.get_status()
print(f"Status: {status}")

if status == "completed":
    batch.download()

API Reference

Batch Class

The main class for managing Structured Outputs batch operations.

Constructor

Batch(
    input_file: str,
    output_file: str,
    error_file: str,
    job_name: str,
    batch_id: Optional[str] = None
)

Parameters:

input_file: Path to the input JSONL file
output_file: Path where output will be saved
error_file: Path where errors will be saved
job_name: Name identifier for the batch job
batch_id: Optional batch ID for retrieving existing batches

Methods

add_task(id, model, system_prompt, user_prompt, response_model): Add a Structured Outputs task to the batch. The response_model should be a Pydantic model that will be converted to JSON Schema with additionalProperties: false automatically applied.
upload(): Upload the batch to OpenAI
get_status(): Get the current status of the batch
download(): Download results and errors

Costs Class

Utility class for cost tracking and estimation.

Constructor

Costs(model: str)

Parameters:

model: OpenAI model name (e.g., "gpt-4o-mini", "gpt-4o", "o3")

Methods

input_cost(filename): Calculate input token costs
output_cost(filename): Calculate output token costs
input_tokens(filename): Count input tokens
output_tokens(filename): Count output tokens

Supported Models

The library supports the following OpenAI models with cost tracking:

gpt-4o-mini
gpt-4o
o3
o3-mini
o4-mini

Environment Setup

Make sure you have your OpenAI API key set in your environment:

export OPENAI_API_KEY="your-api-key-here"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

If you encounter any issues or have questions, please open an issue on GitHub.

Changelog

0.1.1

Fixed description on Pypi

0.1.0

Initial release
Structured Outputs batch processing functionality
Automatic handling of additionalProperties: false schema requirement
Cost tracking and estimation
Pydantic model support for structured outputs

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
openai_so_batch		openai_so_batch
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI Batch Processing Library

Key Features

What is Structured Outputs?

Installation

From PyPI (recommended)

From source

Quick Start

Basic Structured Outputs Usage

Cost Tracking

Retrieving Existing Batches

API Reference

Batch Class

Constructor

Methods

Costs Class

Constructor

Methods

Supported Models

Environment Setup

License

Contributing

Support

Changelog

0.1.1

0.1.0

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenAI Batch Processing Library

Key Features

What is Structured Outputs?

Installation

From PyPI (recommended)

From source

Quick Start

Basic Structured Outputs Usage

Cost Tracking

Retrieving Existing Batches

API Reference

Batch Class

Constructor

Methods

Costs Class

Constructor

Methods

Supported Models

Environment Setup

License

Contributing

Support

Changelog

0.1.1

0.1.0

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages