Draft implementation of support for embeddings APIs #3252

dmontagu · 2025-10-24T21:15:59Z

Started this in collaboration with @DouweM, I'd like to ensure consensus on the API design before adding the remaining-providers/logfire-instrumentation/docs/tests.

This is inspired by the approach in haiku.rag, though we adapted it to be a bit closer to the Agent APIs are used (and how you can override model, settings, etc.).

Closes #58

DouweM · 2025-10-24T21:17:48Z

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

+from pydantic_ai.models.instrumented import InstrumentationSettings
+from pydantic_ai.providers import infer_provider
+
+KnownEmbeddingModelName = TypeAliasType(


Add a test like this one to verify this is up to date:

pydantic-ai/tests/models/test_model_names.py

Line 52 in efa1e26

def test_known_model_names(): # pragma: lax no cover

DouweM · 2025-10-24T21:18:14Z

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

+    if model_kind.startswith('gateway/'):
+        model_kind = provider_name.removeprefix('gateway/')
+
+    # TODO: extend the following list for other providers as appropriate


We'll have to check which of the OpenAI-compatible APIs also support embeddings

DouweM · 2025-10-24T21:19:01Z

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

+
+        return CohereEmbeddingModel(model_name, provider=provider)
+    else:
+        raise UserError(f'Unknown embeddings model: {model}')  # pragma: no cover


https://github.com/ggozad/haiku.rag/tree/main/src/haiku/rag/embeddings has Ollama, vLLM and VoyageAI, which would be worth adding as well

DouweM · 2025-10-24T21:19:08Z

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

+        raise UserError(f'Unknown embeddings model: {model}')  # pragma: no cover
+
+
+@dataclass


Suggested change

@dataclass

@dataclass(init=False)

DouweM · 2025-10-24T21:20:11Z

pydantic_ai_slim/pydantic_ai/embeddings/cohere.py

+
+        Args:
+            model_name: The name of the Cohere model to use. List of model names
+                available [here](https://docs.cohere.com/docs/models#command).


Update to https://docs.cohere.com/docs/cohere-embed

DouweM · 2025-10-24T21:21:22Z

pydantic_ai_slim/pydantic_ai/embeddings/embedding_model.py

I'd prefer to move this to __init__

DouweM · 2025-10-24T21:21:58Z

pydantic_ai_slim/pydantic_ai/embeddings/openai.py

+
+        Args:
+            model_name: The name of the OpenAI model to use. List of model names
+                available [here](https://docs.OpenAI.com/docs/models#command).


https://platform.openai.com/docs/guides/embeddings#embedding-models

DouweM · 2025-10-24T21:22:02Z

pydantic_ai_slim/pydantic_ai/embeddings/openai.py

+            provider: The provider to use for authentication and API access. Can be either the string
+                'OpenAI' or an instance of `Provider[AsyncClientV2]`. If not provided, a new provider will be
+                created using the other parameters.
+            profile: The model profile to use. Defaults to a profile picked by the provider based on the model name.


DouweM · 2025-10-24T21:22:29Z

pydantic_ai_slim/pydantic_ai/embeddings/openai.py

+        input_is_string = isinstance(documents, str)
+        if input_is_string:
+            documents = [documents]


Not sure how I feel about every model implementation needing to repeat this

DouweM · 2025-10-24T21:23:14Z

pydantic_ai_slim/pydantic_ai/embeddings/settings.py

+
+    Supported by:
+
+    * Cohere (See `cohere.EmbedInputType`)


Following the pattern in ModelSettings, we should move any options only supported by one model to {Cohere}EmbeddingSettings with a {cohere}_ prefix

github-actions · 2025-10-24T21:28:58Z

Docs Preview

commit:	`3dbad0d`
Preview URL:	https://fda6de3f-pydantic-ai-previews.pydantic.workers.dev

ggozad · 2025-10-29T07:45:24Z

Thanks for starting this and please do let me know if you need help :)
I went quickly through, looks like a great start!

One thing you might want to support from the start is having as part of the EmbeddingSettings is max_context_length and encoding.

Embedding models have a limit of how many tokens of input they can handle. Most providers will raise (openai.BadRequestError iirc for OpenAI, vLLM will return an ugly 500 omg) and then some will say nothing (looking at you Ollama) and just truncate the input so that it fits.

All this is well explained here

I would not necessarily truncate like in the cookbook and still just raise, but I would be grateful to have available from the model side the max_context_length and the encoding so that as a library I can quickly check if a chunk of text fits or not.
Even better if I could get the number of tokens used for some text by a given embedding model.

The only difficulty I see with this is that not all providers expose the tokenizers, for example Ollama does not. But still, would be nice to have it for the providers that do support it, as it's a crucial step when you are trying to chunk a document for embedding.

In haiku.rag, my focus is local models, and like I mentioned Ollama, the popular choice, does not expose a way to tokenize text. So I just do the dumb thing and guesstimate the tokens hoping they are not going to be all that different from some OpenAI model's encoder: I use tiktoken (which you would probably also want to use to support this) and gpt-4o as a "close" model and get an estimate. But I am sure we can do better that this here.

Edit: I am not suggesting that calling embed should calculate the tokens needed on every call. But I imagine that whoever used pydantic AI to embed, would need to also go through the process of chunking some large text, unless they only dealt with embedding queries or simple sentences. So it would be a missed opportunity to not have support for that.

Draft implementation of support for embeddings APIs

3dbad0d

DouweM requested changes Oct 24, 2025

View reviewed changes

		raise UserError(f'Unknown embeddings model: {model}') # pragma: no cover


		@dataclass

Draft implementation of support for embeddings APIs #3252

Are you sure you want to change the base?

Draft implementation of support for embeddings APIs #3252

Uh oh!

Conversation

dmontagu commented Oct 24, 2025 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 24, 2025

Docs Preview

Uh oh!

ggozad commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dmontagu commented Oct 24, 2025 •

edited by DouweM

Loading

ggozad commented Oct 29, 2025 •

edited

Loading