Skip to content

Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response #3

@liyoungc

Description

@liyoungc

Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response

Symptom

When model: hf:moonshotai/Kimi-K2.5 (or any Kimi slug) is configured to go through an aggregator base URL (e.g. https://api.synthetic.new/v1), long agentic prompts — especially cron jobs with multiple tool calls — consistently fail with:

⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call detected — retrying API call...
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.

Eventually the cron scheduler records last_status: error, last_error: "Agent completed but produced empty response (model error, timeout, or misconfiguration)". Direct LINE / Discord chat with the same model usually works because short prompts have enough budget to leak past the truncation, but anything longer than ~30 tool-bearing turns dies.

Root cause

_is_kimi (run_agent.py:8309 in current main) is base-URL-only:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

It misses every aggregator that routes to Moonshot inference. The downstream effect is in agent/transports/chat_completions.py:240–280:

  • The Kimi-specific max_tokens=32000 default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
  • The Kimi-specific reasoning_effort=medium hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
  • Kimi K2.5 has a thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
  • finish_reason: length with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.

The interesting thing is is_moonshot_model(self.model) already exists in agent/moonshot_schema.py:171 and correctly recognises model-name-based slugs (hf:moonshotai/Kimi-K2.5, nous/moonshotai/kimi-k2.5, …). It's even used immediately above the is_kimi check in chat_completions.py to sanitize tools. But the runtime detection path doesn't reuse it.

Repro

  1. Set model.default: hf:moonshotai/Kimi-K2.5 and a synthetic.new (or OpenRouter, Together) credential in ~/.hermes/auth.json.
  2. Create any cron job with kind: cron, deliver: discord, prompt requiring 3–5 tool calls (e.g. multiple curl + a python3 script).
  3. Wait for fire — every fire ends in last_status: error, last_error: "Agent completed but produced empty response". Container logs show repeated Response truncated (finish_reason='length') warnings with no visible content between them.

The same model works fine when configured against https://api.moonshot.ai/v1 directly because base URL matches and the Kimi defaults kick in.

Proposed fix

--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
         _is_nous = "nousresearch" in self._base_url_lower
         _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+        # Detect Kimi models routed through aggregators (synthetic.new,
+        # OpenRouter, Together, ...).  Without this branch, those routes
+        # miss the Kimi-specific max_tokens=32000 default and the
+        # reasoning_effort=medium hint, leaving the model with whatever
+        # tiny output budget the aggregator defaults to and free-running
+        # thinking mode that swallows the entire budget — visible response
+        # ends up empty.
+        try:
+            from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+        except Exception:  # pragma: no cover — optional helper
+            _is_moonshot = lambda _m: False  # noqa: E731
         _is_kimi = (
             base_url_host_matches(self.base_url, "api.kimi.com")
             or base_url_host_matches(self.base_url, "moonshot.ai")
             or base_url_host_matches(self.base_url, "moonshot.cn")
+            or _is_moonshot(self.model)
         )

This makes _is_kimi consistent with the model-name-based detection already used elsewhere (is_moonshot_model in agent/moonshot_schema.py:171). Both max_tokens=32000 and reasoning_effort=medium then route correctly regardless of which aggregator the user goes through.

Related

This bug compounded with two other cron robustness issues filed separately (null next_run_at skip + non-dict origin AttributeError) — taken together they explain a class of "cron quietly does nothing" reports on aggregated Kimi deployments.

Environment

  • hermes-agent: upstream/main as of 2026-05-02
  • Model: hf:moonshotai/Kimi-K2.5
  • Provider: synthetic.new (https://api.synthetic.new/v1)
  • Python 3.14, croniter installed
  • Encountered on a chococlaw VPS Docker deployment (Linux Debian)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions