Skip to content

feat: Add Image Input Support and Proxy Configuration#1

Open
simplaj wants to merge 7 commits intollmsresearch:mainfrom
simplaj:feat/image-input-proxy-support
Open

feat: Add Image Input Support and Proxy Configuration#1
simplaj wants to merge 7 commits intollmsresearch:mainfrom
simplaj:feat/image-input-proxy-support

Conversation

@simplaj
Copy link
Collaborator

@simplaj simplaj commented Feb 4, 2026

Summary
This PR introduces support for user-provided input images (e.g., sketches) to guide the diagram generation process, adds configuration support for local proxies (essential for regions without direct API access), and fixes a critical Unicode crash on Windows.

Key Changes

  1. New Feature: Image Input Support
    CLI: Added --image / -img option to paperbanana generate.
    Pipeline: Modified
    PlannerAgent
    to accept input images and include them in the multimodal prompt context. The Planner now considers both the text methodology and the user provided sketch/chart when designing the diagram.
    Example Usage:
    bash
    paperbanana generate --input method.txt --caption "Overview" --image sketch.png
  2. Proxy & Custom Endpoint Support
    Configuration: Added support for GEMINI_BASE_URL environment variable.
    Implementation: Updated GeminiVLM and GoogleImagenGen providers to respect the custom base URL to support local proxies (e.g., Antigravity Tools, http://127.0.0.1:8045).
  3. Bug Fixes
    Windows Compat: Fixed a UnicodeDecodeError in ReferenceStore by explicitly forcing encoding="utf-8" when opening JSON files. This prevents crashes on Windows systems where the default encoding might be GBK.

@dippatel1994
Copy link
Member

Thanks @simplaj for the PR. Did you test it? I think I made model selection configurable for command-line executions. Just want to confirm this before I accept and merge the PR.

@simplaj
Copy link
Collaborator Author

simplaj commented Feb 4, 2026

Thanks @simplaj for the PR. Did you test it? I think I made model selection configurable for command-line executions. Just want to confirm this before I accept and merge the PR.

Thanks for the review! I accidentally committed my local config changes. I have just pushed a commit to revert
config.yaml and config.py to their defaults. I've verified that the new features work correctly with the default configuration and that model selection via CLI (--vlm-model etc.) works as intended.

@dippatel1994
Copy link
Member

dippatel1994 commented Feb 5, 2026

Thanks for the update. This looks useful overall. I have two small requests before the merge:

  1. The VLM default model changes to gemini-3-pro-preview in GeminiVLM.__init__, but Settings and docs still use gemini-2.0-flash. Can we align these, either by reverting the default change or updating Settings and docs for consistency?
  2. For the new --image flag, can we validate the image paths in the CLI (similar to the input file check) and surface a clear error or warning instead of a silent warning later in the pipeline?

With those tweaks, I am happy to approve.

@simplaj
Copy link
Collaborator Author

simplaj commented Feb 5, 2026

Thanks for the feedback! I've pushed fixes:

1. VLM default model alignment

Reverted GeminiVLM.__init__ default from gemini-3-pro-preview back to gemini-2.0-flash to match Settings and documentation.

- def __init__(self, api_key: Optional[str] = None, model: str = "gemini-3-pro-preview"):
+ def __init__(self, api_key: Optional[str] = None, model: str = "gemini-2.0-flash"):

2. CLI image path validation

Added upfront validation for --image paths in the CLI. If any image file doesn't exist, it now shows a clear error and exits immediately (similar to the --input file check):

# Validate image paths if provided
if image:
    for img_path in image:
        if not Path(img_path).exists():
            console.print(f"[red]Error: Image file not found: {img_path}[/red]")
            raise typer.Exit(1)

3. Documentation update

Updated README to clarify how to pass multiple images:

| `--image` | `-img` | Path to input image(s); repeat for multiple (e.g. `--image a.png --image b.png`) |

Let me know if there's anything else!

@dippatel1994
Copy link
Member

Thank you for the update, and apologies for the follow-up. I have one concern.

The SDK documentation mentions that http_options is supported, but a custom base_url is currently only supported when vertexai=True. In API key mode, the base_url option may be ignored or could raise an error depending on the google-genai library version.

Could you update the proxy support to avoid using base_url for non-Vertex flows, or add a guard that shows a clear error message?

The documentation also recommends proxying via HTTPS_PROXY, SSL_CERT_FILE, or the proxy arguments in http_options. These approaches should be more broadly compatible.

Reference
Google API documentation - see the “Custom base url” section. It states: “Currently, only vertexai=True is supported.

@simplaj
Copy link
Collaborator Author

simplaj commented Feb 7, 2026

Thank you for the update, and apologies for the follow-up. I have one concern.

The SDK documentation mentions that http_options is supported, but a custom base_url is currently only supported when vertexai=True. In API key mode, the base_url option may be ignored or could raise an error depending on the google-genai library version.

Could you update the proxy support to avoid using base_url for non-Vertex flows, or add a guard that shows a clear error message?

The documentation also recommends proxying via HTTPS_PROXY, SSL_CERT_FILE, or the proxy arguments in http_options. These approaches should be more broadly compatible.

Reference Google API documentation - see the “Custom base url” section. It states: “Currently, only vertexai=True is supported.

Sorry for the slow response! Catching up on some deadlines at the moment. I'll get back to this PR a bit later. Thanks for your great work again!

@dippatel1994
Copy link
Member

Hey @simplaj, no problem. Take care! Looking forward to scaling this solution. Thanks for accepting the invitation. I will link this to an issue. We need to add more providers and perhaps add an eval to compare quality across models.

@simplaj
Copy link
Collaborator Author

simplaj commented Feb 7, 2026

Hey @simplaj, no problem. Take care! Looking forward to scaling this solution. Thanks for accepting the invitation. I will link this to an issue. We need to add more providers and perhaps add an eval to compare quality across models.

Thanks for understanding! Scaling and adding evals sounds like a great plan. Maybe we can dive into the details after I'm past the deadline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants