feat: context-window guard for LLM document context#133
feat: context-window guard for LLM document context#133
Conversation
- Add estimate_token_count() and truncate_context_to_token_limit() helpers in filehandling.py (use tiktoken with char-based fallback) - Add Config.context_window_max_tokens = 100,000 in __init__.py - In process_llm_async: proactively truncate full_pdf_context and write a non-error warning into assay.status_context when truncation occurs - In generate_answer: handle openai.BadRequestError explicitly with a logger.error call so context-length failures are logged, not silently swallowed - Add test_context_window.py with unit tests for the utilities and integration tests for the warning behaviour Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/8064d2c3-5696-4c13-9f7c-6f0d4adf28f1 Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
- Extract _TRUNCATION_SAFETY_MARGIN = 0.95 named constant in filehandling.py - Log tiktoken fallback in estimate_token_count (ImportError vs generic) - Use logger.exception (with traceback) for BadRequestError in generate_answer Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/8064d2c3-5696-4c13-9f7c-6f0d4adf28f1 Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
|
we now have a context-window flag on the .env model TAGS, so we have access to per-model context-window information. |
|
@copilot can you update the PR to take into account the newest changes to the repo in #138, we now have context-window information per model accessible in the .env model TAGS. so instead of hard-coding context-window size in init.py we have to define a head-room token size that reservers space in the context window (for prompt, output etc) and the difference we can give to the pdf context |
…-window-upload Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
Replace blanket context_window_max_tokens with per-model budget derived from ModelEntry.context_window (set via the TAGS context-window: key) minus a configurable headroom. Fall back to context_window_fallback_tokens when the active model has no context-window tag. - __init__.py: replace context_window_max_tokens with context_window_headroom_tokens (10k) and context_window_fallback_tokens (128k) - views.py: resolve ModelEntry for llm_model key, compute budget as context_window - headroom; fall back to fallback - headroom with a warning - Merge upstream changes from main (ModelEntry.context_window property, context-window KNOWN_TAG_KEYS entry, context_window in llm_signature) - tests: update integration tests to patch the new config fields and add a test that exercises the per-model context-window resolution path Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/b1bcef97-dba0-4c5d-998a-8d322da78b35 Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/b1bcef97-dba0-4c5d-998a-8d322da78b35 Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
Done in commit
In budget = model_entry.context_window - config.context_window_headroom_tokensIf the model has no |
Uploading too many files silently produced empty answers: oversized context caused the LLM API to raise
BadRequestError, which the generic exception handler caught and returned""— no user-visible indication anything went wrong.Changes
filehandling.py— two new utilities:estimate_token_count(text)— tiktoken (cl100k_base) withlen // 4char-based fallbacktruncate_context_to_token_limit(text, max_tokens) -> (str, bool)— proportional trim with a_TRUNCATION_SAFETY_MARGIN = 0.95buffer; appends a visible[... context truncated ...]marker in the returned text__init__.py— two new config values replacing a blanket token limit:context_window_headroom_tokens = 10_000— tokens reserved for system prompts, per-question instructions, and the model's outputcontext_window_fallback_tokens = 128_000— conservative fallback used when the active model has nocontext-windowtagviews.py:process_llm_async— resolves the active model'scontext_windowtag (set viaAZURE_E{n}_TAGS_{TAG}=...,context-window:131072,...) and computes the PDF context budget asmodel_context_window - headroom; falls back tofallback - headroomwith a log warning when the tag is absent; on truncation writes a non-error warning toassay.status_context(visible in the UI) advising the user to upload fewer/shorter filesgenerate_answer— adds a dedicatedexcept BadRequestErrorblock usinglogger.exceptionso any API-level rejection is logged with a full traceback rather than silently swallowedtests/test_context_window.py— unit tests for both utilities; integration tests asserting thatprocess_llm_asyncpopulatesstatus_contextwhen over-limit (fallback path and per-model path) and leaves it clean when context fits