Skip to content

Updates OpenAI to use Responses API#161

Merged
JasonTheAdams merged 23 commits intotrunkfrom
add/proper-openai-provider-implementation
Jan 16, 2026
Merged

Updates OpenAI to use Responses API#161
JasonTheAdams merged 23 commits intotrunkfrom
add/proper-openai-provider-implementation

Conversation

@JasonTheAdams
Copy link
Member

@JasonTheAdams JasonTheAdams commented Dec 30, 2025

Resolves #96

This PR migrates the OpenAI provider implementation from the legacy Chat Completions API (/v1/chat/completions) to the newer Responses API (/v1/responses) for text generation and Images API (/v1/images/generations) for image generation. The OpenAiTextGenerationModel and OpenAiImageGenerationModel classes have been completely rewritten to extend AbstractApiBasedModel directly, following the same implementation pattern used by the Google and Anthropic providers. The OpenAiCompatible helper classes remain available for other providers that use OpenAI-compatible APIs.

What's New

  • Built-in web search tool — GPT models can now use OpenAI's native web_search tool via the webSearch config option
  • Built-in code interpreter — Access the code_interpreter tool via customOptions for code execution capabilities
  • Continuing a previous chat — Use previous_response_id with a previous response id to continue the conversation
  • Responses API for image generation — gpt-image-* models now use the Responses API with the image_generation tool, providing a consistent API surface
  • Structured output support — JSON schema output via the text.format parameter
  • Function calling — Full support for custom function declarations with the new output format
  • Comprehensive test coverage — 39 new unit tests covering both text and image generation models
  • Document support — the text generation models can receive documents (e.g. PDFs) now

The image generation tool means the output is multi-modal, so for now I'm not adding support for this. We're discussing a solution in #160 and then can circle back on adding support for this.

Bonus

I updated the cli.php to support stdin and file references. This was useful for testing having it examine an image and describe it to me (multi-modal input).

Testing

I tested out the text generation as well as image generation, web search tool, code interpretation, and making an accurate picture of my crazy dog jumping out of a helicopter:

image

@github-actions
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: JasonTheAdams <jason_the_adams@git.wordpress.org>
Co-authored-by: felixarntz <flixos90@git.wordpress.org>
Co-authored-by: raftaar1191 <raftaar1191@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@JasonTheAdams JasonTheAdams self-assigned this Dec 30, 2025
@JasonTheAdams JasonTheAdams added this to the 0.4.0 milestone Dec 30, 2025
@JasonTheAdams JasonTheAdams added the [Type] Enhancement A suggestion for improvement. label Dec 30, 2025
Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JasonTheAdams Thanks for working on this!

I haven't done a full review yet, but my early feedback focuses mostly on concerns with the image generation implementation. I'm not convinced why that needs to use the Responses API.

* Class for an OpenAI image generation model using the Responses API.
*
* @since 0.1.0
* This uses the Responses API with the built-in image_generation tool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't expect to see the Responses API being used for image generation. Given you did that, I assume it's possible, also given your example, but is that the case for all image generation models? Have you tried the implementation with the older models like dall-e-3? Just want to make sure they can also be used with the Responses API.

My original expectation was that we would only need to update the OpenAiTextGenerationModel to use the Responses API.

@JasonTheAdams
Copy link
Member Author

JasonTheAdams commented Dec 31, 2025

@felixarntz So you're completely reasonable to wonder why I'm using the Responses API and not the Images API. After much research, I come out the other side wondering if OpenAI could have made it any more convoluted. 😂 😭

I'm going to write this out so we're on the same page, though I'm sure you know a good bit of it.

So images can be generated using both the Images API and the Responses API. The Responses API is, as a whole, the more capable API, providing things like message history, caching (to help with cost), stateful history, multimodal output, and so forth. The only thing it can't currently do is use gpt-image-1.5 (only v1). For all intents and purposes, it's the more capable API. But it has trade-offs:

  • Image generation is a tool of the base model. So you can't actually use gpt-image-1 directly; you have to use something like gpt-5 which then calls the image model. This means you effectively need to specify two models.
  • The image prompt is adjusted by the base model. So if you put "a king charles spaniel" the base model will attempt to rewrite that into a "better" image prompt. This may be useful, but it may also adjust it in a surprising way.
  • As mentioned, you can't use the gpt-image-1.5 model yet.

It's weird. I don't love my first pass, but it does work. I'm thinking of making a decision matrix so it conditionally uses the most "optimal" API. Something like:

flowchart TD
  A([Start: intent = generateImage]) --> B{Has message history}
  B -- Yes --> R[Use Responses API]
  B -- No --> C{Has previous_response_id}
  C -- Yes --> R
  C -- No --> D{Specified model is non-image}
  D -- Yes --> R
  D -- No --> E{Needs multimodal output}
  E -- Yes --> R
  E -- No --> F{Needs pre-render reasoning}
  F -- Yes --> R
  F -- No --> I[Use Images API]
Loading

I'd also consider adding a custom option so someone can specify the base and/or image model if the Responses API is used.

What do you think?

@felixarntz
Copy link
Member

@JasonTheAdams Thanks for outlining this in so much detail! Your rationale makes a lot of sense to me.

Two follow up questions:

  1. "Has previous_response_id" - what does this mean? What is a "previous response ID"?
  2. "Specified model is non-image" - unless we intentionally make that possible, right now it wouldn't be. We wouldn't assign a non-image model the image generation capability, at least not now. And I'm not sure why that would make sense. If we want to enable someone to use the Responses API for image generation, I think that should rather be a conscious decision where you can specify the image generation model as usual, but then through some other more advanced option the "host" model for the Responses API.

For now, as of for this PR, I would suggest to keep things simple for image generation and stick with only the Images API: While the Responses API is better in many ways, it's also a lot more complex to think through and properly implement for the image generation use-case, and while most branches of the decision tree you shared end up in the Responses API, I think in 95% of actual usage it'll end up in the one branch that goes to the Images API.

The most valuable benefit of the Responses API for image generation IMO is that it can deal with messages history.

But we can add support for that separately, let's open an issue. The purpose of this PR is primarily to migrate chat/completions usage to the Responses API, since for that the latter is clearly recommended, so I'd say let's leave image generation (almost?) untouched.

@JasonTheAdams
Copy link
Member Author

"Has previous_response_id" - what does this mean? What is a "previous response ID"?

The Responses API supports stateful prompting. You can connect it with the Conversations API or use this previous response ID.

In our system, the response ID would be sent back as "additional data" in the GenerativeAiResult, which could then be passed as a custom config option.

"Specified model is non-image"

I see what you mean. I'd be fine with that for now. 👍

I like the idea of keeping this simple for images and opening an Issue afterwards. I'll update this to go that route!

@JasonTheAdams
Copy link
Member Author

Back to you, @felixarntz! I switched the Image Generation portion to using the Images API. Here's some fun tests of "A happy Belgian Malinois dog playing in a sunny field":

Model: gpt-image-1
image

Model: dall-e-3
image

@JasonTheAdams
Copy link
Member Author

I did add support for previous_response_id to the text generation because it was very simple to add and could certainly be useful for some folks.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JasonTheAdams The Responses API implementation looks solid, a few small points there.

For image generation, now I'm confused why you're not reusing the existing abstract OpenAI compatible implementation, which already relies on that same endpoint.

@JasonTheAdams JasonTheAdams force-pushed the add/proper-openai-provider-implementation branch from 1f41f2b to bf04bc6 Compare January 13, 2026 00:09
Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JasonTheAdams LGTM - awesome work! One small last nit-pick, but good to go.

@JasonTheAdams JasonTheAdams merged commit b2257c2 into trunk Jan 16, 2026
5 checks passed
@JasonTheAdams JasonTheAdams deleted the add/proper-openai-provider-implementation branch January 16, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Type] Enhancement A suggestion for improvement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use OpenAI Responses API instead of the Chat Completions API

2 participants