fix(llm): Gemini 2.5 compatibility — thinking parts and response parsing#98
Open
webframp wants to merge 4 commits intocalesthio:masterfrom
Open
fix(llm): Gemini 2.5 compatibility — thinking parts and response parsing#98webframp wants to merge 4 commits intocalesthio:masterfrom
webframp wants to merge 4 commits intocalesthio:masterfrom
Conversation
The custom .env parser did not handle quoted values, causing passwords and API keys containing special characters (|, ^, ", >, [, etc.) to include the quote characters as part of the value or parse incorrectly. This adds quote stripping for both single and double-quoted values, matching the behavior of dotenv and other standard .env parsers. Co-authored-by: Shelley <shelley@exe.dev>
Slash command registration was called before client.login(), so client.user.id was undefined and fell back to the string "me", causing a Discord API error: Invalid Form Body application_id[NUMBER_TYPE_COERCE]: Value "me" is not snowflake. This moves command registration into the ready event handler and attaches that handler before login() to avoid a race condition where the ready event fires before the listener is attached. Co-authored-by: Shelley <shelley@exe.dev>
When running behind a reverse proxy or on a remote host, the /status command in Telegram and Discord shows http://localhost:PORT which is not reachable by users. Adds a PUBLIC_URL env var that, when set, replaces the hardcoded localhost URL in bot status responses. Falls back to localhost when unset, so existing setups are unaffected. Example: PUBLIC_URL=https://my-crucix.example.com Co-authored-by: Shelley <shelley@exe.dev>
Three issues when using Gemini 2.5 Flash/Pro models: 1. Gemini 2.5 models return multi-part responses where the first part is a "thinking" part and the second is the actual content. The provider only read parts[0], getting thinking text instead of the response. Fixed by filtering out thought parts. 2. Thinking tokens consumed the maxOutputTokens budget, causing truncated JSON responses (cut mid-object). Added thinkingConfig with a 1024-token budget to keep reasoning concise, and bumped idea generation to 8192 output tokens. 3. The ideas response parser failed on Gemini output because it only handled code blocks at string boundaries. Rewrote to extract code blocks from anywhere in the response and fall back to finding the JSON array if no code block is present. Co-authored-by: Shelley <shelley@exe.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Gemini 2.5 Flash and Pro models fail to generate trade ideas due to three compounding issues:
1. Thinking parts in multi-part responses
Gemini 2.5 models return
partsarrays where the first element is a \thinking\ part (thought: true) and the actual response is in subsequent parts. The provider reads onlyparts[0], so it gets raw reasoning instead of the JSON output.2. Truncated output from thinking token budget
Thinking tokens consume the
maxOutputTokensbudget, leaving insufficient tokens for the actual response. This causes the JSON array to be cut mid-object (e.g., 693 chars instead of ~2000+), making it unparseable.3. Brittle code block extraction
The ideas parser only handled code blocks at exact string boundaries (
startsWith/ regex anchored to$). Gemini responses may have trailing whitespace, extra text, or different formatting that breaks extraction.Result:
[LLM Ideas] No valid ideas parsed from responseon every sweep with Gemini 2.5 Flash/Pro, despite the model returning valid ideas inside the response.Fix
lib/llm/gemini.mjsthoughtparts from response, concatenate only non-thinking partsthinkingConfig: { thinkingBudget: 1024 }to keep reasoning concise and preserve output token budgetlib/llm/ideas.mjs\``json...```` blocks anywhere in the response (not just at boundaries)[...]) directly if no code block is foundmaxTokensfrom 4096 to 8192 for idea generationTesting
Before:
0 ideas (llm-failed)on every sweep withgemini-2.5-flashAfter:
5 ideas (llm)consistently generated and parsed correctly