Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response
Symptom
When model: hf:moonshotai/Kimi-K2.5 (or any Kimi slug) is configured to go through an aggregator base URL (e.g. https://api.synthetic.new/v1), long agentic prompts — especially cron jobs with multiple tool calls — consistently fail with:
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Truncated tool call detected — retrying API call...
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Truncated tool call response detected again — refusing to execute incomplete tool arguments.
Eventually the cron scheduler records last_status: error, last_error: "Agent completed but produced empty response (model error, timeout, or misconfiguration)". Direct LINE / Discord chat with the same model usually works because short prompts have enough budget to leak past the truncation, but anything longer than ~30 tool-bearing turns dies.
Root cause
_is_kimi (run_agent.py:8309 in current main) is base-URL-only:
_is_kimi = (
base_url_host_matches(self.base_url, "api.kimi.com")
or base_url_host_matches(self.base_url, "moonshot.ai")
or base_url_host_matches(self.base_url, "moonshot.cn")
)
It misses every aggregator that routes to Moonshot inference. The downstream effect is in agent/transports/chat_completions.py:240–280:
- The Kimi-specific
max_tokens=32000 default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
- The Kimi-specific
reasoning_effort=medium hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
- Kimi K2.5 has a thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
finish_reason: length with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.
The interesting thing is is_moonshot_model(self.model) already exists in agent/moonshot_schema.py:171 and correctly recognises model-name-based slugs (hf:moonshotai/Kimi-K2.5, nous/moonshotai/kimi-k2.5, …). It's even used immediately above the is_kimi check in chat_completions.py to sanitize tools. But the runtime detection path doesn't reuse it.
Repro
- Set
model.default: hf:moonshotai/Kimi-K2.5 and a synthetic.new (or OpenRouter, Together) credential in ~/.hermes/auth.json.
- Create any cron job with
kind: cron, deliver: discord, prompt requiring 3–5 tool calls (e.g. multiple curl + a python3 script).
- Wait for fire — every fire ends in
last_status: error, last_error: "Agent completed but produced empty response". Container logs show repeated Response truncated (finish_reason='length') warnings with no visible content between them.
The same model works fine when configured against https://api.moonshot.ai/v1 directly because base URL matches and the Kimi defaults kick in.
Proposed fix
--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
_is_nous = "nousresearch" in self._base_url_lower
_is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+ # Detect Kimi models routed through aggregators (synthetic.new,
+ # OpenRouter, Together, ...). Without this branch, those routes
+ # miss the Kimi-specific max_tokens=32000 default and the
+ # reasoning_effort=medium hint, leaving the model with whatever
+ # tiny output budget the aggregator defaults to and free-running
+ # thinking mode that swallows the entire budget — visible response
+ # ends up empty.
+ try:
+ from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+ except Exception: # pragma: no cover — optional helper
+ _is_moonshot = lambda _m: False # noqa: E731
_is_kimi = (
base_url_host_matches(self.base_url, "api.kimi.com")
or base_url_host_matches(self.base_url, "moonshot.ai")
or base_url_host_matches(self.base_url, "moonshot.cn")
+ or _is_moonshot(self.model)
)
This makes _is_kimi consistent with the model-name-based detection already used elsewhere (is_moonshot_model in agent/moonshot_schema.py:171). Both max_tokens=32000 and reasoning_effort=medium then route correctly regardless of which aggregator the user goes through.
Related
This bug compounded with two other cron robustness issues filed separately (null next_run_at skip + non-dict origin AttributeError) — taken together they explain a class of "cron quietly does nothing" reports on aggregated Kimi deployments.
Environment
- hermes-agent:
upstream/main as of 2026-05-02
- Model:
hf:moonshotai/Kimi-K2.5
- Provider: synthetic.new (
https://api.synthetic.new/v1)
- Python 3.14, croniter installed
- Encountered on a chococlaw VPS Docker deployment (Linux Debian)
Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response
Symptom
When
model: hf:moonshotai/Kimi-K2.5(or any Kimi slug) is configured to go through an aggregator base URL (e.g.https://api.synthetic.new/v1), long agentic prompts — especially cron jobs with multiple tool calls — consistently fail with:Eventually the cron scheduler records
last_status: error, last_error: "Agent completed but produced empty response (model error, timeout, or misconfiguration)". Direct LINE / Discord chat with the same model usually works because short prompts have enough budget to leak past the truncation, but anything longer than ~30 tool-bearing turns dies.Root cause
_is_kimi(run_agent.py:8309 in currentmain) is base-URL-only:It misses every aggregator that routes to Moonshot inference. The downstream effect is in
agent/transports/chat_completions.py:240–280:max_tokens=32000default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).reasoning_effort=mediumhint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.finish_reason: lengthwith empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.The interesting thing is
is_moonshot_model(self.model)already exists inagent/moonshot_schema.py:171and correctly recognises model-name-based slugs (hf:moonshotai/Kimi-K2.5,nous/moonshotai/kimi-k2.5, …). It's even used immediately above theis_kimicheck inchat_completions.pyto sanitize tools. But the runtime detection path doesn't reuse it.Repro
model.default: hf:moonshotai/Kimi-K2.5and asynthetic.new(or OpenRouter, Together) credential in~/.hermes/auth.json.kind: cron,deliver: discord, prompt requiring 3–5 tool calls (e.g. multiplecurl+ apython3script).last_status: error, last_error: "Agent completed but produced empty response". Container logs show repeatedResponse truncated (finish_reason='length')warnings with no visible content between them.The same model works fine when configured against
https://api.moonshot.ai/v1directly because base URL matches and the Kimi defaults kick in.Proposed fix
This makes
_is_kimiconsistent with the model-name-based detection already used elsewhere (is_moonshot_modelinagent/moonshot_schema.py:171). Bothmax_tokens=32000andreasoning_effort=mediumthen route correctly regardless of which aggregator the user goes through.Related
This bug compounded with two other cron robustness issues filed separately (null
next_run_atskip + non-dictoriginAttributeError) — taken together they explain a class of "cron quietly does nothing" reports on aggregated Kimi deployments.Environment
upstream/mainas of 2026-05-02hf:moonshotai/Kimi-K2.5https://api.synthetic.new/v1)