Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response

# Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response

## Symptom

When `model: hf:moonshotai/Kimi-K2.5` (or any Kimi slug) is configured to go through an aggregator base URL (e.g. `https://api.synthetic.new/v1`), long agentic prompts — especially cron jobs with multiple tool calls — consistently fail with:

```
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call detected — retrying API call...
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.
```

Eventually the cron scheduler records `last_status: error, last_error: "Agent completed but produced empty response (model error, timeout, or misconfiguration)"`. Direct LINE / Discord chat with the same model usually works because short prompts have enough budget to leak past the truncation, but anything longer than ~30 tool-bearing turns dies.

## Root cause

`_is_kimi` (run_agent.py:8309 in current `main`) is **base-URL-only**:

```python
_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)
```

It misses every aggregator that routes to Moonshot inference. The downstream effect is in `agent/transports/chat_completions.py:240–280`:

- The Kimi-specific `max_tokens=32000` default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
- The Kimi-specific `reasoning_effort=medium` hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
- Kimi K2.5 has a **thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text**. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
- `finish_reason: length` with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.

The interesting thing is `is_moonshot_model(self.model)` already exists in `agent/moonshot_schema.py:171` and correctly recognises model-name-based slugs (`hf:moonshotai/Kimi-K2.5`, `nous/moonshotai/kimi-k2.5`, …). It's even used immediately above the `is_kimi` check in `chat_completions.py` to sanitize tools. But the runtime detection path doesn't reuse it.

## Repro

1. Set `model.default: hf:moonshotai/Kimi-K2.5` and a `synthetic.new` (or OpenRouter, Together) credential in `~/.hermes/auth.json`.
2. Create any cron job with `kind: cron`, `deliver: discord`, prompt requiring 3–5 tool calls (e.g. multiple `curl` + a `python3` script).
3. Wait for fire — every fire ends in `last_status: error, last_error: "Agent completed but produced empty response"`. Container logs show repeated `Response truncated (finish_reason='length')` warnings with no visible content between them.

The same model works fine when configured against `https://api.moonshot.ai/v1` directly because base URL matches and the Kimi defaults kick in.

## Proposed fix

```diff
--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
         _is_nous = "nousresearch" in self._base_url_lower
         _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+        # Detect Kimi models routed through aggregators (synthetic.new,
+        # OpenRouter, Together, ...).  Without this branch, those routes
+        # miss the Kimi-specific max_tokens=32000 default and the
+        # reasoning_effort=medium hint, leaving the model with whatever
+        # tiny output budget the aggregator defaults to and free-running
+        # thinking mode that swallows the entire budget — visible response
+        # ends up empty.
+        try:
+            from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+        except Exception:  # pragma: no cover — optional helper
+            _is_moonshot = lambda _m: False  # noqa: E731
         _is_kimi = (
             base_url_host_matches(self.base_url, "api.kimi.com")
             or base_url_host_matches(self.base_url, "moonshot.ai")
             or base_url_host_matches(self.base_url, "moonshot.cn")
+            or _is_moonshot(self.model)
         )
```

This makes `_is_kimi` consistent with the model-name-based detection already used elsewhere (`is_moonshot_model` in `agent/moonshot_schema.py:171`). Both `max_tokens=32000` and `reasoning_effort=medium` then route correctly regardless of which aggregator the user goes through.

## Related

This bug compounded with two other cron robustness issues filed separately (null `next_run_at` skip + non-dict `origin` AttributeError) — taken together they explain a class of "cron quietly does nothing" reports on aggregated Kimi deployments.

## Environment
- hermes-agent: `upstream/main` as of 2026-05-02
- Model: `hf:moonshotai/Kimi-K2.5`
- Provider: synthetic.new (`https://api.synthetic.new/v1`)
- Python 3.14, croniter installed
- Encountered on a chococlaw VPS Docker deployment (Linux Debian)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response #3

Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response

Symptom

Root cause

Repro

Proposed fix

Related

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response #3

Description

Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response

Symptom

Root cause

Repro

Proposed fix

Related

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions