Skip to content

fix(agents): auto-title crash on metadata=None#16

Merged
TheAlexPG merged 3 commits intomainfrom
fix/prod-notifications-and-autotitle
May 6, 2026
Merged

fix(agents): auto-title crash on metadata=None#16
TheAlexPG merged 3 commits intomainfrom
fix/prod-notifications-and-autotitle

Conversation

@TheAlexPG
Copy link
Copy Markdown
Owner

Summary

  • LLMClient._build_langfuse_metadata raised AttributeError whenever the caller passed metadata=None. The /agents/sessions/{id}/auto-title endpoint always does this (title generation isn't part of an agent invocation, must not show in Langfuse), so every fresh chat session silently fell back to a 60-char slice of the first user message instead of the LLM-generated 3-6 word title, with auto-title LLM call failed: 'NoneType' object has no attribute 'analytics_consent' in prod logs.
  • Make metadata: LLMCallMetadata | None in acompletion. Have _build_langfuse_metadata return None when call_meta is None (same path as analytics_consent="off").
  • Add test_langfuse_metadata_none_returns_none regression.

Notes on the notifications errors in the same prod log

The relation "notifications" does not exist errors stopped on prod between 09:07 and 10:54 UTC today — that's the deploy of #14 picking up the a1f8c9d2b3e4_repair_notifications_table migration (CREATE TABLE IF NOT EXISTS). No further action needed unless those errors come back.

Test plan

  • pytest tests/agents/test_llm.py -q — 19 passed
  • After merge: tail prod logs for auto-title LLM call failed — should be gone
  • After merge: open a new chat, verify the session gets a real 3-6 word title (not "first 60 chars of message")

TheAlexPG added 3 commits May 6, 2026 14:41
The /auto-title endpoint passes metadata=None because session-title
generation isn't part of an agent invocation and must not show up in
Langfuse. But LLMClient._build_langfuse_metadata accessed
call_meta.analytics_consent unconditionally, so every auto-title call
raised AttributeError.

The endpoint's outer try/except hid the failure — every new chat session
silently fell back to a 60-char slice of the first user message instead
of the LLM-generated 3-6 word title, and logs filled with
"auto-title LLM call failed: 'NoneType' object has no attribute
analytics_consent".

Make metadata optional in acompletion's signature and have
_build_langfuse_metadata return None when call_meta is None (same
behavior as analytics_consent="off"). Adds regression test.
When the frontend aborts a streaming endpoint (tab close, navigation,
network blip), uvicorn cancels the ASGI task with
asyncio.CancelledError. That hits get_db while it's running
session.commit() after the yield. Our `except Exception:` did NOT catch
CancelledError (it's a BaseException, not Exception), so:

  1. commit() bubbles the CancelledError out of the try.
  2. The async-with __aexit__ kicks in and tries to terminate the
     asyncpg connection.
  3. Connection terminate ALSO runs in the cancelled scope, raises
     CancelledError again — SQLAlchemy logs "Exception terminating
     connection ..." at ERROR level.
  4. Then the original CancelledError bubbles all the way up, uvicorn
     logs "Exception in ASGI application" with a multi-screen traceback.

Net effect: every SSE disconnect produces a wall of red in prod logs
even though the cancellation is correct and resources still get cleaned
up. Restructure get_db to catch BaseException, do a best-effort rollback
(suppressed because the scope may already be torn down), and re-raise.
Move commit() into the else branch so it only runs on normal completion.
Self-hosted Langfuse setups often use LANGFUSE_BASE_URL in their .env
(matching the public Langfuse SDK docs in some Docker examples). Our
code reads only LANGFUSE_HOST (LiteLLM convention), so a misnamed env
var silently disables tracing — `is_langfuse_configured()` returns
False, callbacks never register, and operators see no traces with no
loud error.

Use Pydantic AliasChoices so langfuse_host accepts either env name.
LANGFUSE_HOST wins when both are set (canonical first). Document the
alias in .env.example and README.

Adds two regression tests that instantiate a fresh Settings() with
each env-var combination.
@TheAlexPG TheAlexPG merged commit 42dd968 into main May 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant