-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add force_download support for Anthropic and OAIR models and clarify proper BinaryContent base64 handling
#3694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Docs Preview
|
|
work's still ongoing to address the other points mentioned in the issue |
force_download support for Anthropic and OAIR models and clarify proper BinaryContent base64 handling
force_download support for Anthropic and OAIR models and clarify proper BinaryContent base64 handlingforce_download support for Anthropic and OAIR models and clarify proper BinaryContent base64 handling
| yield AnthropicModel._map_binary_content(item) | ||
| elif isinstance(item, ImageUrl): | ||
| if item.force_download: | ||
| downloaded = await download_item(item, data_format='bytes') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also respect force_download for DocumentUrl + item.media_type == 'application/pdf' further down, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use _map_binary_content there, if we make it more generic so it can take the result of download_item? (Or take a FileUrl | BinaryContent and if it gets a FileUrl, do the download first)
| type='image', | ||
| ) | ||
| elif item.media_type == 'application/pdf': | ||
| return BetaBase64PDFBlockParam( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now an alias for BetaRequestDocumentBlockParam, so let's use the newer name
| ) | ||
| elif item.media_type == 'application/pdf': | ||
| return BetaBase64PDFBlockParam( | ||
| source=BetaBase64PDFSourceParam( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at what other sources this supports, and there's also BetaFileDocumentSourceParam, which takes a file_id for the file upload API.
We're adding support for uploaded files in #2611, but that PR has been stale for a bit so may be interesting for you to pick up.
| ), | ||
| ] | ||
| ) | ||
| assert result.output == snapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test that the forced download is actually happening?
|
|
||
| document_url = DocumentUrl( | ||
| url='https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf', | ||
| force_download=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as up -- we're not testing that we're respecting this, although I suppose that fact that those lines have test coverage sort of does? But it'd be good if we could assert that explicitly
This commit addresses Douwe's review comments on PR #3694: 1. **Anthropic: Add force_download for DocumentUrl PDF** - DocumentUrl with PDF now respects force_download parameter - When force_download=True: downloads and sends as base64 - When force_download=False: sends URL directly 2. **Anthropic: Rename BetaBase64PDFBlockParam → BetaRequestDocumentBlockParam** - Updated to use newer type name (BetaBase64PDFBlockParam is now an alias) - Updated imports and all usages 3. **Anthropic: Refactor _map_binary_content to handle FileUrl** - Created new _map_binary_to_block helper method - Eliminates code duplication between BinaryContent, ImageUrl force_download, and DocumentUrl force_download - Supports image/*, application/pdf, and text/plain media types 4. **Messages: Use base64 property in otel_message_parts** - Replaced manual base64.b64encode calls with item.base64 property - 3 replacements in UserPromptPart.otel_message_parts and ModelResponse.otel_* methods 5. **Tests: Add force_download verification tests** - 4 new tests for Anthropic (ImageUrl + DocumentUrl) - 5 new tests for OpenAI (ImageUrl + DocumentUrl + AudioUrl) - Tests verify download_item is called when force_download=True All tests pass (306 tests in test_anthropic.py, test_openai.py, test_messages.py). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
||
| data: bytes | ||
| """The binary data.""" | ||
| """Arbitrary binary data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Arbitrary" may be a little too strong, as it is specifically a file matching the type in media_type 😄
I think if we make this The binary file data, we can remove the specific mention of base64 that's added in the next lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep one as it reduces the likelihood of users using it incorrectly
| raise RuntimeError(f'Unsupported binary content media type for Anthropic: {media_type}') | ||
|
|
||
| @staticmethod | ||
| def _map_binary_content(item: BinaryContent) -> BetaContentBlockParam: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this method anymore as it's just one line that can be inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if download_item could be refactored to return a BinaryContent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would've been cool if it had been like BinaryContent.download or .from_url but I don't think the replacement is worth the hassle now
|
changed the docs as follows https://29ed66a3-pydantic-ai-previews.pydantic.workers.dev/input
|

Closes #3569 - Review multi-modal input handling.
Changes
BinaryContent improvements:
base64property that returns raw base64-encoded stringdatafield docstring to clarify it should contain raw bytes, not base64-encoded stringsAnthropic provider:
force_downloadsupport forImageUrl- downloads and sends as base64 whenforce_download=Truetext/plainsupport forBinaryContent- decodes UTF-8 and sends as plain text document_map_binary_content()helper to reduce complexity of_map_user_prompt()Test fixes:
test_mcp_sampling.py: was storingbase64.b64encode(b'data')instead of rawb'data'test_mistral.py: was storing base64 ASCII characters as bytestest_anthropic.pyerror message match for audio contentOpenAI Responses API:
force_downloadsupport forAudioUrlandDocumentUrl- usesfile_urlto send URLs directly whenforce_download=False(default), downloads and sends asfile_datawhenforce_download=TrueAudioUrlandDocumentUrlhandling into single branchNew tests:
test_binary_content_base64- verifies new propertytest_image_url_input_force_download- verifies Anthropic force_downloadtest_text_document_as_binary_content_input- verifies Anthropic text/plain supporttest_document_url_input_response_api- verifies URL is sent directly (default behavior)test_document_url_input_force_download_response_api- verifies download when force_download=True