Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive embedding generation support to the PHP AI Client, enabling developers to generate vector embeddings using compatible AI models. The implementation extends the existing architecture with minimal disruption while maintaining consistency with the established patterns for text and image generation.
Key Changes:
- Introduces embedding-specific DTOs (Embedding, EmbeddingResult, EmbeddingOperation) with full serialization support
- Adds embedding generation interfaces for model implementations
- Implements OpenAI embedding model support with proper request/response handling
- Extends PromptBuilder and AiClient with embedding-specific methods
- Updates CLI with embedding capability support and multiple output formats
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/Results/DTO/EmbeddingResultTest.php | Comprehensive test coverage for EmbeddingResult including validation, transformations, and helpers |
| tests/unit/Providers/ProviderRegistryTest.php | Adds test for discovering embedding-capable models through registry |
| tests/unit/Providers/Models/DTO/ModelRequirementsTest.php | Ensures embeddings don't incorrectly require chat history capability |
| tests/unit/ProviderImplementations/OpenAi/OpenAiEmbeddingModelTest.php | Tests OpenAI embedding model with mocked HTTP responses |
| tests/unit/Operations/DTO/EmbeddingOperationTest.php | Tests embedding operation state management and serialization |
| tests/unit/Embeddings/DTO/EmbeddingTest.php | Tests embedding vector validation and normalization |
| tests/unit/Builders/PromptBuilderTest.php | Tests new embedding-specific builder methods |
| tests/unit/AiClientTest.php | Tests static helper methods for embedding generation |
| tests/traits/MockModelCreationTrait.php | Adds mock model creation helpers for embedding tests |
| tests/mocks/MockProvider.php | Registers mock embedding model for testing |
| src/Results/DTO/EmbeddingResult.php | Core result DTO with vector extraction helpers |
| src/Providers/Models/EmbeddingGeneration/Contracts/EmbeddingGenerationOperationModelInterface.php | Interface for async embedding operations |
| src/Providers/Models/EmbeddingGeneration/Contracts/EmbeddingGenerationModelInterface.php | Interface for synchronous embedding generation |
| src/Providers/Models/DTO/ModelRequirements.php | Excludes chat history requirement for embedding capability |
| src/Providers/Models/Contracts/WithEmbeddingOperationsInterface.php | Interface for retrieving embedding operations by ID |
| src/ProviderImplementations/OpenAi/OpenAiProvider.php | Adds embedding model instantiation in factory method |
| src/ProviderImplementations/OpenAi/OpenAiModelMetadataDirectory.php | Detects and categorizes embedding models from OpenAI API |
| src/ProviderImplementations/OpenAi/OpenAiEmbeddingModel.php | Full OpenAI embedding API implementation |
| src/Operations/DTO/EmbeddingOperation.php | Operation DTO for async embedding workflows |
| src/Embeddings/DTO/Embedding.php | Single embedding vector DTO with validation |
| src/Builders/PromptBuilder.php | Adds withEmbeddingInputs and generate methods for embeddings |
| src/AiClient.php | Adds static helper methods for embedding generation |
| cli.php | Extends CLI with embeddings capability and output formats |
| README.md | Documents embedding generation usage and CLI examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** @var list<CapabilityEnum> $modelCaps */ | ||
| $modelCaps = []; | ||
| /** @var list<SupportedOption> $modelOptions */ |
There was a problem hiding this comment.
These PHPStan type annotations for $modelCaps and $modelOptions are unnecessary. The variables are immediately assigned values from the closure's use clause parameters ($embeddingCapabilities, $ttsCapabilities, etc.), which already have their types defined. Since these annotations don't add value and clutter the code, consider removing them.
| /** @var list<CapabilityEnum> $modelCaps */ | |
| $modelCaps = []; | |
| /** @var list<SupportedOption> $modelOptions */ | |
| $modelCaps = []; |
felixarntz
left a comment
There was a problem hiding this comment.
@Jameswlepage This is coming along nicely - sorry for the delay in review.
I think there are a few things we can simplify, but most importantly let's not add full operation support yet, as we don't have that elsewhere and it requires a bit more thinking through holistically.
| * @param mixed ...$inputs The embedding inputs to add. | ||
| * @return self | ||
| */ | ||
| public function withEmbeddingInputs(...$inputs): self |
There was a problem hiding this comment.
There's no need for this. The input for embeddings is fully compatible with regular prompts. So you can use the same consistent API.
| /** | ||
| * @var list<Message> Embedding-specific input messages. | ||
| */ | ||
| protected array $embeddingInputs = []; |
There was a problem hiding this comment.
See other comment, we can simply use $messages for embeddings too. They won't hold prompts in that case, but input data.
| $builder = self::prompt(null, $registry); | ||
| if ($modelOrConfig instanceof ModelInterface) { | ||
| $builder->usingModel($modelOrConfig); | ||
| } elseif ($modelOrConfig instanceof ModelConfig) { | ||
| $builder->usingModelConfig($modelOrConfig); | ||
| } |
There was a problem hiding this comment.
Can you use the getConfiguredPromptBuilder method instead of duplicating this logic? You can pass $input as the first parameter to the prompt method - they're supposed to be interoperable so you can use the same consistent API.
| $builder->usingModelConfig($modelOrConfig); | ||
| } | ||
|
|
||
| $builder->withEmbeddingInputs($input); |
There was a problem hiding this comment.
See my comments on PromptBuilder, this won't be needed. Input data can be passed via prompt.
| * @param ProviderRegistry|null $registry Optional custom registry. | ||
| * @return EmbeddingOperation The embeddings operation. | ||
| */ | ||
| public static function generateEmbeddingsOperation( |
There was a problem hiding this comment.
Let's not implement this yet - we don't support long-running operations for anything yet, so doing that here would be preliminary.
| * @param mixed $input The embedding input to append. | ||
| * @return void | ||
| */ | ||
| private function appendEmbeddingInput($input): void |
There was a problem hiding this comment.
This is probably not needed? Or, why is it?
| * | ||
| * @since 0.2.0 | ||
| * | ||
| * @param list<float|int> $vector The embedding vector values. |
There was a problem hiding this comment.
Can these really be both? Or should they always be float?
| * | ||
| * @since 0.2.0 | ||
| */ | ||
| interface WithEmbeddingOperationsInterface |
There was a problem hiding this comment.
See above, we don't need this yet.
| * | ||
| * @extends AbstractDataTransferObject<EmbeddingResultArrayShape> | ||
| */ | ||
| class EmbeddingResult extends AbstractDataTransferObject implements ResultInterface |
There was a problem hiding this comment.
I realize this was named like that in ARCHITECTURE.md, but I think it may be better to call it EmbeddingsResult, in alignment with the other method naming and the fact that 95% of the time you'll be generating multiple embeddings.
| * | ||
| * @extends AbstractDataTransferObject<EmbeddingOperationArrayShape> | ||
| */ | ||
| class EmbeddingOperation extends AbstractDataTransferObject implements OperationInterface |
There was a problem hiding this comment.
See my other comment on EmbeddingResult, maybe we should call this EmbeddingsResult?
Fixes #130
Summary
Testing