Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c642ac7
review multi modal input handling
dsfaccini Dec 10, 2025
754c0f0
enable url media for openai responses
dsfaccini Dec 10, 2025
2f5bcf6
Update documentation for BinaryContent.base64 usage
dsfaccini Dec 10, 2025
5d99ed2
Merge branch 'main' into review-multimodal
dsfaccini Dec 10, 2025
ff5d79a
Merge branch 'main' into review-multimodal
dsfaccini Dec 10, 2025
887a79f
Merge branch 'main' into review-multimodal
dsfaccini Dec 10, 2025
6bba553
Address review comments: force_download, type rename, refactoring
dsfaccini Dec 11, 2025
7389467
upstream cerebras merge stuff
dsfaccini Dec 11, 2025
8ad810d
AsyncClient
dsfaccini Dec 11, 2025
da9ec78
base64 replacements
dsfaccini Dec 11, 2025
9a55ce2
add force download support for mistral
dsfaccini Dec 11, 2025
703b772
address review comments
dsfaccini Dec 13, 2025
35d8745
update docs
dsfaccini Dec 13, 2025
d9e8a01
allow explicit download disallowing
dsfaccini Dec 14, 2025
0bcd2e8
Merge remote-tracking branch 'origin/main' into review-multimodal
dsfaccini Dec 14, 2025
8a98f77
fix linting issue and update doc
dsfaccini Dec 14, 2025
45d35e5
fix tests
dsfaccini Dec 14, 2025
4a6234d
undo tri-optional force download and remove experiments folder
dsfaccini Dec 16, 2025
13c5284
include suggestion
dsfaccini Dec 16, 2025
91b1f79
include suggestion
dsfaccini Dec 16, 2025
0688875
Merge remote-tracking branch 'upstream/main' into review-multimodal
dsfaccini Dec 16, 2025
eab1e14
requested changes
dsfaccini Dec 16, 2025
6e0b72d
fix tests
dsfaccini Dec 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 22 additions & 9 deletions docs/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,21 +104,34 @@ print(result.output)

## User-side download vs. direct file URL

When you provide a URL using any of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI will typically send the URL directly to the model API so that the download happens on their side.
When using one of `ImageUrl`, `AudioUrl`, `VideoUrl` or `DocumentUrl`, Pydantic AI will default to sending the URL to the model provider, so the file is downloaded on their side.

Some model APIs do not support file URLs at all or for specific file types. In the following cases, Pydantic AI will download the file content and send it as part of the API request instead:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need a sentence like this to explain what will happen if the table below has "Sends URL directly" as false, especially because the "false" is not explicitly stated but implied because of something like "ImageUrl, AudioUrl, DocumentUrl | ImageUrl only". It's not currently clear to the user what happens for AudioUrl and DocumentUrl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be columns: "Send URL directly | Download URL and send bytes | Unsupported", listing each type explicitly for each provider

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving this open for checking it tomorrow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Support for file URLs varies depending on type and provider:

- [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel]: `AudioUrl` and `DocumentUrl`
- [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel]: All URLs
- [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel]: `DocumentUrl` with media type `text/plain`
- [`GoogleModel`][pydantic_ai.models.google.GoogleModel] using GLA (Gemini Developer API): All URLs except YouTube video URLs and files uploaded to the [Files API](https://ai.google.dev/gemini-api/docs/files).
- [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel]: All URLs except S3 URLs, specifically starting with `s3://`.
| Model | Send URL directly | Download and send bytes | Unsupported |
|-------|-------------------|-------------------------|-------------|
| [`OpenAIChatModel`][pydantic_ai.models.openai.OpenAIChatModel] | `ImageUrl` | `AudioUrl`, `DocumentUrl` | `VideoUrl` |
| [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] | `ImageUrl`, `AudioUrl`, `DocumentUrl` | — | `VideoUrl` |
| [`AnthropicModel`][pydantic_ai.models.anthropic.AnthropicModel] | `ImageUrl`, `DocumentUrl` (PDF) | `DocumentUrl` (`text/plain`) | `AudioUrl`, `VideoUrl` |
| [`GoogleModel`][pydantic_ai.models.google.GoogleModel] (Vertex) | All URL types | — | — |
| [`GoogleModel`][pydantic_ai.models.google.GoogleModel] (GLA) | [YouTube](models/google.md#document-image-audio-and-video-input), [Files API](https://ai.google.dev/gemini-api/docs/files) | All other URLs | — |
| [`MistralModel`][pydantic_ai.models.mistral.MistralModel] | `ImageUrl`, `DocumentUrl` (PDF) | — | `AudioUrl`, `VideoUrl` |
| [`BedrockConverseModel`][pydantic_ai.models.bedrock.BedrockConverseModel] | S3 URLs (`s3://`) | `ImageUrl`, `DocumentUrl`, `VideoUrl` | `AudioUrl` |

If the model API supports file URLs but may not be able to download a file because of crawling or access restrictions, you can instruct Pydantic AI to download the file content and send that instead of the URL by enabling the `force_download` flag on the URL object. For example, [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI limits YouTube video URLs to one URL per request.
A model API may be unable to download a file (e.g., because of crawling or access restrictions) even if it supports file URLs. For example, [`GoogleModel`][pydantic_ai.models.google.GoogleModel] on Vertex AI limits YouTube video URLs to one URL per request. In such cases, you can instruct Pydantic AI to download the file content locally and send that instead of the URL by setting `force_download` on the URL object:

```py {title="force_download.py" test="skip" lint="skip"}
from pydantic_ai import ImageUrl, AudioUrl, VideoUrl, DocumentUrl

ImageUrl(url='https://example.com/image.png', force_download=True)
AudioUrl(url='https://example.com/audio.mp3', force_download=True)
VideoUrl(url='https://example.com/video.mp4', force_download=True)
DocumentUrl(url='https://example.com/doc.pdf', force_download=True)
```

## Uploaded Files

Some model providers like Google's Gemini API support [uploading files](https://ai.google.dev/gemini-api/docs/files). You can upload a file to the model API using the client you can get from the provider and use the resulting URL as input:
Some model providers like Google's Gemini API support [uploading files](https://ai.google.dev/gemini-api/docs/files). You can upload a file using the provider's client and passing the resulting URL as input:

```py {title="file_upload.py" test="skip"}
from pydantic_ai import Agent, DocumentUrl
Expand Down
20 changes: 19 additions & 1 deletion docs/models/google.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,25 @@ agent = Agent(model)

## Document, Image, Audio, and Video Input

`GoogleModel` supports multi-modal input, including documents, images, audio, and video. See the [input documentation](../input.md) for details and examples.
`GoogleModel` supports multi-modal input, including documents, images, audio, and video.

YouTube video URLs can be passed directly to Google models:

```py {title="youtube_input.py" test="skip" lint="skip"}
from pydantic_ai import Agent, VideoUrl
from pydantic_ai.models.google import GoogleModel

agent = Agent(GoogleModel('gemini-2.5-flash'))
result = agent.run_sync(
[
'What is this video about?',
VideoUrl(url='https://www.youtube.com/watch?v=dQw4w9WgXcQ'),
]
)
print(result.output)
```

See the [input documentation](../input.md) for more details and examples.

## Model settings

Expand Down
2 changes: 1 addition & 1 deletion pydantic_ai_slim/pydantic_ai/_mcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def add_msg(
'user',
mcp_types.ImageContent(
type='image',
data=base64.b64encode(chunk.data).decode(),
data=chunk.base64,
mimeType=chunk.media_type,
),
)
Expand Down
18 changes: 13 additions & 5 deletions pydantic_ai_slim/pydantic_ai/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,10 @@ class BinaryContent:
"""Binary content, e.g. an audio or image file."""

data: bytes
"""The binary data."""
"""The binary file data.

Use `.base64` to get the base64-encoded string.
"""

_: KW_ONLY

Expand Down Expand Up @@ -574,7 +577,12 @@ def identifier(self) -> str:
@property
def data_uri(self) -> str:
"""Convert the `BinaryContent` to a data URI."""
return f'data:{self.media_type};base64,{base64.b64encode(self.data).decode()}'
return f'data:{self.media_type};base64,{self.base64}'

@property
def base64(self) -> str:
"""Return the binary data as a base64-encoded string. Default encoding is UTF-8."""
return base64.b64encode(self.data).decode()

@property
def is_audio(self) -> bool:
Expand Down Expand Up @@ -776,7 +784,7 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
elif isinstance(part, BinaryContent):
converted_part = _otel_messages.BinaryDataPart(type='binary', media_type=part.media_type)
if settings.include_content and settings.include_binary_content:
converted_part['content'] = base64.b64encode(part.data).decode()
converted_part['content'] = part.base64
parts.append(converted_part)
elif isinstance(part, CachePoint):
# CachePoint is a marker, not actual content - skip it for otel
Expand Down Expand Up @@ -1396,7 +1404,7 @@ def new_event_body():
'kind': 'binary',
'media_type': part.content.media_type,
**(
{'binary_content': base64.b64encode(part.content.data).decode()}
{'binary_content': part.content.base64}
if settings.include_content and settings.include_binary_content
else {}
),
Expand Down Expand Up @@ -1430,7 +1438,7 @@ def otel_message_parts(self, settings: InstrumentationSettings) -> list[_otel_me
elif isinstance(part, FilePart):
converted_part = _otel_messages.BinaryDataPart(type='binary', media_type=part.content.media_type)
if settings.include_content and settings.include_binary_content:
converted_part['content'] = base64.b64encode(part.content.data).decode()
converted_part['content'] = part.content.base64
parts.append(converted_part)
elif isinstance(part, BaseToolCallPart):
call_part = _otel_messages.ToolCallPart(type='tool_call', id=part.tool_call_id, name=part.tool_name)
Expand Down
4 changes: 2 additions & 2 deletions pydantic_ai_slim/pydantic_ai/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import base64
import warnings
from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Callable, Iterator
from collections.abc import AsyncIterator, Callable, Iterator, Sequence
from contextlib import asynccontextmanager, contextmanager
from dataclasses import dataclass, field, replace
from datetime import datetime
Expand Down Expand Up @@ -733,7 +733,7 @@ def base_url(self) -> str | None:

@staticmethod
def _get_instructions(
messages: list[ModelMessage], model_request_parameters: ModelRequestParameters | None = None
messages: Sequence[ModelMessage], model_request_parameters: ModelRequestParameters | None = None
) -> str | None:
"""Get instructions from the first ModelRequest found when iterating messages in reverse.

Expand Down
60 changes: 40 additions & 20 deletions pydantic_ai_slim/pydantic_ai/models/anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@
omit as OMIT,
)
from anthropic.types.beta import (
BetaBase64PDFBlockParam,
BetaBase64PDFSourceParam,
BetaCacheControlEphemeralParam,
BetaCitationsConfigParam,
Expand Down Expand Up @@ -98,6 +97,7 @@
BetaRawMessageStreamEvent,
BetaRedactedThinkingBlock,
BetaRedactedThinkingBlockParam,
BetaRequestDocumentBlockParam,
BetaRequestMCPServerToolConfigurationParam,
BetaRequestMCPServerURLDefinitionParam,
BetaServerToolUseBlock,
Expand Down Expand Up @@ -1035,6 +1035,31 @@ def _add_cache_control_to_last_param(
# Add cache_control to the last param
last_param['cache_control'] = self._build_cache_control(ttl)

@staticmethod
def _map_binary_data(data: bytes, media_type: str) -> BetaContentBlockParam:
# Anthropic SDK accepts file-like objects (IO[bytes]) and handles base64 encoding internally
if media_type.startswith('image/'):
return BetaImageBlockParam(
source={'data': io.BytesIO(data), 'media_type': media_type, 'type': 'base64'}, # type: ignore
type='image',
)
elif media_type == 'application/pdf':
return BetaRequestDocumentBlockParam(
source=BetaBase64PDFSourceParam(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at what other sources this supports, and there's also BetaFileDocumentSourceParam, which takes a file_id for the file upload API.

We're adding support for uploaded files in #2611, but that PR has been stale for a bit so may be interesting for you to pick up.

data=io.BytesIO(data),
media_type='application/pdf',
type='base64',
),
type='document',
)
elif media_type == 'text/plain':
return BetaRequestDocumentBlockParam(
source=BetaPlainTextSourceParam(data=data.decode('utf-8'), media_type=media_type, type='text'),
type='document',
)
else:
raise RuntimeError(f'Unsupported binary content media type for Anthropic: {media_type}')

@staticmethod
async def _map_user_prompt(
part: UserPromptPart,
Expand All @@ -1050,30 +1075,25 @@ async def _map_user_prompt(
elif isinstance(item, CachePoint):
yield item
elif isinstance(item, BinaryContent):
if item.is_image:
yield BetaImageBlockParam(
source={'data': io.BytesIO(item.data), 'media_type': item.media_type, 'type': 'base64'}, # type: ignore
type='image',
)
elif item.media_type == 'application/pdf':
yield BetaBase64PDFBlockParam(
source=BetaBase64PDFSourceParam(
data=io.BytesIO(item.data),
media_type='application/pdf',
type='base64',
),
type='document',
)
else:
raise RuntimeError('Only images and PDFs are supported for binary content')
yield AnthropicModel._map_binary_data(item.data, item.media_type)
elif isinstance(item, ImageUrl):
yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')
if item.force_download:
downloaded = await download_item(item, data_format='bytes')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also respect force_download for DocumentUrl + item.media_type == 'application/pdf' further down, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use _map_binary_content there, if we make it more generic so it can take the result of download_item? (Or take a FileUrl | BinaryContent and if it gets a FileUrl, do the download first)

yield AnthropicModel._map_binary_data(downloaded['data'], item.media_type)
else:
yield BetaImageBlockParam(source={'type': 'url', 'url': item.url}, type='image')
elif isinstance(item, DocumentUrl):
if item.media_type == 'application/pdf':
yield BetaBase64PDFBlockParam(source={'url': item.url, 'type': 'url'}, type='document')
if item.force_download:
downloaded = await download_item(item, data_format='bytes')
yield AnthropicModel._map_binary_data(downloaded['data'], item.media_type)
else:
yield BetaRequestDocumentBlockParam(
source={'url': item.url, 'type': 'url'}, type='document'
)
elif item.media_type == 'text/plain':
downloaded_item = await download_item(item, data_format='text')
yield BetaBase64PDFBlockParam(
yield BetaRequestDocumentBlockParam(
source=BetaPlainTextSourceParam(
data=downloaded_item['data'], media_type=item.media_type, type='text'
),
Expand Down
4 changes: 2 additions & 2 deletions pydantic_ai_slim/pydantic_ai/models/bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import functools
import typing
from collections.abc import AsyncIterator, Iterable, Iterator, Mapping
from collections.abc import AsyncIterator, Iterable, Iterator, Mapping, Sequence
from contextlib import asynccontextmanager
from dataclasses import dataclass, field
from datetime import datetime
Expand Down Expand Up @@ -548,7 +548,7 @@ def _map_tool_config(

async def _map_messages( # noqa: C901
self,
messages: list[ModelMessage],
messages: Sequence[ModelMessage],
model_request_parameters: ModelRequestParameters,
model_settings: BedrockModelSettings | None,
) -> tuple[list[SystemContentBlockTypeDef], list[MessageUnionTypeDef]]:
Expand Down
4 changes: 1 addition & 3 deletions pydantic_ai_slim/pydantic_ai/models/gemini.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from __future__ import annotations as _annotations

import base64
from collections.abc import AsyncIterator, Sequence
from contextlib import asynccontextmanager
from dataclasses import dataclass, field
Expand Down Expand Up @@ -377,9 +376,8 @@ async def _map_user_prompt(self, part: UserPromptPart) -> list[_GeminiPartUnion]
if isinstance(item, str):
content.append({'text': item})
elif isinstance(item, BinaryContent):
base64_encoded = base64.b64encode(item.data).decode('utf-8')
content.append(
_GeminiInlineDataPart(inline_data={'data': base64_encoded, 'mime_type': item.media_type})
_GeminiInlineDataPart(inline_data={'data': item.base64, 'mime_type': item.media_type})
)
elif isinstance(item, VideoUrl) and item.is_youtube:
file_data = _GeminiFileDataPart(file_data={'file_uri': item.url, 'mime_type': item.media_type})
Expand Down
Loading