feat: Add Image Input Support and Proxy Configuration by simplaj · Pull Request #1 · llmsresearch/paperbanana

simplaj · 2026-02-04T18:03:11Z

Summary
This PR introduces support for user-provided input images (e.g., sketches) to guide the diagram generation process, adds configuration support for local proxies (essential for regions without direct API access), and fixes a critical Unicode crash on Windows.

Key Changes

New Feature: Image Input Support
CLI: Added --image / -img option to paperbanana generate.
Pipeline: Modified
PlannerAgent
to accept input images and include them in the multimodal prompt context. The Planner now considers both the text methodology and the user provided sketch/chart when designing the diagram.
Example Usage:
bash
paperbanana generate --input method.txt --caption "Overview" --image sketch.png
Proxy & Custom Endpoint Support
Configuration: Added support for GEMINI_BASE_URL environment variable.
Implementation: Updated GeminiVLM and GoogleImagenGen providers to respect the custom base URL to support local proxies (e.g., Antigravity Tools, http://127.0.0.1:8045).
Bug Fixes
Windows Compat: Fixed a UnicodeDecodeError in ReferenceStore by explicitly forcing encoding="utf-8" when opening JSON files. This prevents crashes on Windows systems where the default encoding might be GBK.

dippatel1994 · 2026-02-04T18:30:48Z

Thanks @simplaj for the PR. Did you test it? I think I made model selection configurable for command-line executions. Just want to confirm this before I accept and merge the PR.

simplaj · 2026-02-04T18:59:36Z

Thanks @simplaj for the PR. Did you test it? I think I made model selection configurable for command-line executions. Just want to confirm this before I accept and merge the PR.

Thanks for the review! I accidentally committed my local config changes. I have just pushed a commit to revert
config.yaml and config.py to their defaults. I've verified that the new features work correctly with the default configuration and that model selection via CLI (--vlm-model etc.) works as intended.

dippatel1994 · 2026-02-05T00:24:15Z

Thanks for the update. This looks useful overall. I have two small requests before the merge:

The VLM default model changes to gemini-3-pro-preview in GeminiVLM.__init__, but Settings and docs still use gemini-2.0-flash. Can we align these, either by reverting the default change or updating Settings and docs for consistency?
For the new --image flag, can we validate the image paths in the CLI (similar to the input file check) and surface a clear error or warning instead of a silent warning later in the pipeline?

With those tweaks, I am happy to approve.

simplaj · 2026-02-05T10:10:31Z

Thanks for the feedback! I've pushed fixes:

1. VLM default model alignment

Reverted GeminiVLM.__init__ default from gemini-3-pro-preview back to gemini-2.0-flash to match Settings and documentation.

- def __init__(self, api_key: Optional[str] = None, model: str = "gemini-3-pro-preview"):
+ def __init__(self, api_key: Optional[str] = None, model: str = "gemini-2.0-flash"):

2. CLI image path validation

Added upfront validation for --image paths in the CLI. If any image file doesn't exist, it now shows a clear error and exits immediately (similar to the --input file check):

# Validate image paths if provided
if image:
    for img_path in image:
        if not Path(img_path).exists():
            console.print(f"[red]Error: Image file not found: {img_path}[/red]")
            raise typer.Exit(1)

3. Documentation update

Updated README to clarify how to pass multiple images:

| `--image` | `-img` | Path to input image(s); repeat for multiple (e.g. `--image a.png --image b.png`) |

Let me know if there's anything else!

dippatel1994 · 2026-02-05T11:46:58Z

Thank you for the update, and apologies for the follow-up. I have one concern.

The SDK documentation mentions that http_options is supported, but a custom base_url is currently only supported when vertexai=True. In API key mode, the base_url option may be ignored or could raise an error depending on the google-genai library version.

Could you update the proxy support to avoid using base_url for non-Vertex flows, or add a guard that shows a clear error message?

The documentation also recommends proxying via HTTPS_PROXY, SSL_CERT_FILE, or the proxy arguments in http_options. These approaches should be more broadly compatible.

Reference
Google API documentation - see the “Custom base url” section. It states: “Currently, only vertexai=True is supported.

simplaj · 2026-02-07T21:12:51Z

Thank you for the update, and apologies for the follow-up. I have one concern.

The SDK documentation mentions that http_options is supported, but a custom base_url is currently only supported when vertexai=True. In API key mode, the base_url option may be ignored or could raise an error depending on the google-genai library version.

Could you update the proxy support to avoid using base_url for non-Vertex flows, or add a guard that shows a clear error message?

The documentation also recommends proxying via HTTPS_PROXY, SSL_CERT_FILE, or the proxy arguments in http_options. These approaches should be more broadly compatible.

Reference Google API documentation - see the “Custom base url” section. It states: “Currently, only vertexai=True is supported.

Sorry for the slow response! Catching up on some deadlines at the moment. I'll get back to this PR a bit later. Thanks for your great work again!

dippatel1994 · 2026-02-07T21:16:23Z

Hey @simplaj, no problem. Take care! Looking forward to scaling this solution. Thanks for accepting the invitation. I will link this to an issue. We need to add more providers and perhaps add an eval to compare quality across models.

simplaj · 2026-02-07T21:20:27Z

Hey @simplaj, no problem. Take care! Looking forward to scaling this solution. Thanks for accepting the invitation. I will link this to an issue. We need to add more providers and perhaps add an eval to compare quality across models.

Thanks for understanding! Scaling and adding evals sounds like a great plan. Maybe we can dive into the details after I'm past the deadline.

simplaj added 3 commits February 5, 2026 01:40

add base_url support

5ba64e3

add image input support

0edff4f

edit readme

4af4a1a

simplaj added 2 commits February 5, 2026 02:54

revert useless edit

fd70ea4

revert local setting

94eb591

simplaj added 2 commits February 5, 2026 17:58

fix: align VLM default model and add CLI image path validation

008f79f

docs: clarify multiple image usage in README

a0a5a95

dippatel1994 assigned simplaj Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Image Input Support and Proxy Configuration#1

feat: Add Image Input Support and Proxy Configuration#1
simplaj wants to merge 7 commits intollmsresearch:mainfrom
simplaj:feat/image-input-proxy-support

simplaj commented Feb 4, 2026

Uh oh!

dippatel1994 commented Feb 4, 2026

Uh oh!

simplaj commented Feb 4, 2026

Uh oh!

dippatel1994 commented Feb 5, 2026 •

edited

Loading

Uh oh!

simplaj commented Feb 5, 2026

Uh oh!

dippatel1994 commented Feb 5, 2026

Uh oh!

simplaj commented Feb 7, 2026

Uh oh!

dippatel1994 commented Feb 7, 2026

Uh oh!

simplaj commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simplaj commented Feb 4, 2026

Uh oh!

dippatel1994 commented Feb 4, 2026

Uh oh!

simplaj commented Feb 4, 2026

Uh oh!

dippatel1994 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simplaj commented Feb 5, 2026

1. VLM default model alignment

2. CLI image path validation

3. Documentation update

Uh oh!

dippatel1994 commented Feb 5, 2026

Uh oh!

simplaj commented Feb 7, 2026

Uh oh!

dippatel1994 commented Feb 7, 2026

Uh oh!

simplaj commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dippatel1994 commented Feb 5, 2026 •

edited

Loading