Updates OpenAI to use Responses API#161
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
felixarntz
left a comment
There was a problem hiding this comment.
@JasonTheAdams Thanks for working on this!
I haven't done a full review yet, but my early feedback focuses mostly on concerns with the image generation implementation. I'm not convinced why that needs to use the Responses API.
| * Class for an OpenAI image generation model using the Responses API. | ||
| * | ||
| * @since 0.1.0 | ||
| * This uses the Responses API with the built-in image_generation tool. |
There was a problem hiding this comment.
I didn't expect to see the Responses API being used for image generation. Given you did that, I assume it's possible, also given your example, but is that the case for all image generation models? Have you tried the implementation with the older models like dall-e-3? Just want to make sure they can also be used with the Responses API.
My original expectation was that we would only need to update the OpenAiTextGenerationModel to use the Responses API.
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
|
@felixarntz So you're completely reasonable to wonder why I'm using the Responses API and not the Images API. After much research, I come out the other side wondering if OpenAI could have made it any more convoluted. 😂 😭 I'm going to write this out so we're on the same page, though I'm sure you know a good bit of it. So images can be generated using both the Images API and the Responses API. The Responses API is, as a whole, the more capable API, providing things like message history, caching (to help with cost), stateful history, multimodal output, and so forth. The only thing it can't currently do is use
It's weird. I don't love my first pass, but it does work. I'm thinking of making a decision matrix so it conditionally uses the most "optimal" API. Something like: flowchart TD
A([Start: intent = generateImage]) --> B{Has message history}
B -- Yes --> R[Use Responses API]
B -- No --> C{Has previous_response_id}
C -- Yes --> R
C -- No --> D{Specified model is non-image}
D -- Yes --> R
D -- No --> E{Needs multimodal output}
E -- Yes --> R
E -- No --> F{Needs pre-render reasoning}
F -- Yes --> R
F -- No --> I[Use Images API]
I'd also consider adding a custom option so someone can specify the base and/or image model if the Responses API is used. What do you think? |
|
@JasonTheAdams Thanks for outlining this in so much detail! Your rationale makes a lot of sense to me. Two follow up questions:
For now, as of for this PR, I would suggest to keep things simple for image generation and stick with only the Images API: While the Responses API is better in many ways, it's also a lot more complex to think through and properly implement for the image generation use-case, and while most branches of the decision tree you shared end up in the Responses API, I think in 95% of actual usage it'll end up in the one branch that goes to the Images API. The most valuable benefit of the Responses API for image generation IMO is that it can deal with messages history. But we can add support for that separately, let's open an issue. The purpose of this PR is primarily to migrate |
The Responses API supports stateful prompting. You can connect it with the Conversations API or use this previous response ID. In our system, the response ID would be sent back as "additional data" in the
I see what you mean. I'd be fine with that for now. 👍 I like the idea of keeping this simple for images and opening an Issue afterwards. I'll update this to go that route! |
|
Back to you, @felixarntz! I switched the Image Generation portion to using the Images API. Here's some fun tests of "A happy Belgian Malinois dog playing in a sunny field": |
|
I did add support for |
felixarntz
left a comment
There was a problem hiding this comment.
@JasonTheAdams The Responses API implementation looks solid, a few small points there.
For image generation, now I'm confused why you're not reusing the existing abstract OpenAI compatible implementation, which already relies on that same endpoint.
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiTextGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiTextGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiTextGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiTextGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiTextGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
src/ProviderImplementations/OpenAi/OpenAiTextGenerationModel.php
Outdated
Show resolved
Hide resolved
1f41f2b to
bf04bc6
Compare
felixarntz
left a comment
There was a problem hiding this comment.
@JasonTheAdams LGTM - awesome work! One small last nit-pick, but good to go.
src/ProviderImplementations/OpenAi/OpenAiImageGenerationModel.php
Outdated
Show resolved
Hide resolved
Co-authored-by: Felix Arntz <flixos90@gmail.com>


Resolves #96
This PR migrates the OpenAI provider implementation from the legacy Chat Completions API (/v1/chat/completions) to the newer Responses API (/v1/responses) for text generation and Images API (/v1/images/generations) for image generation. The OpenAiTextGenerationModel and OpenAiImageGenerationModel classes have been completely rewritten to extend AbstractApiBasedModel directly, following the same implementation pattern used by the Google and Anthropic providers. The OpenAiCompatible helper classes remain available for other providers that use OpenAI-compatible APIs.
What's New
previous_response_idwith a previous response id to continue the conversationThe image generation tool means the output is multi-modal, so for now I'm not adding support for this. We're discussing a solution in #160 and then can circle back on adding support for this.
Bonus
I updated the
cli.phpto support stdin and file references. This was useful for testing having it examine an image and describe it to me (multi-modal input).Testing
I tested out the text generation as well as image generation, web search tool, code interpretation, and making an accurate picture of my crazy dog jumping out of a helicopter: