Skip to content

feat: context-window guard for LLM document context#133

Open
Copilot wants to merge 6 commits intomainfrom
copilot/check-context-window-upload
Open

feat: context-window guard for LLM document context#133
Copilot wants to merge 6 commits intomainfrom
copilot/check-context-window-upload

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 25, 2026

Uploading too many files silently produced empty answers: oversized context caused the LLM API to raise BadRequestError, which the generic exception handler caught and returned "" — no user-visible indication anything went wrong.

Changes

  • filehandling.py — two new utilities:

    • estimate_token_count(text) — tiktoken (cl100k_base) with len // 4 char-based fallback
    • truncate_context_to_token_limit(text, max_tokens) -> (str, bool) — proportional trim with a _TRUNCATION_SAFETY_MARGIN = 0.95 buffer; appends a visible [... context truncated ...] marker in the returned text
  • __init__.py — two new config values replacing a blanket token limit:

    • context_window_headroom_tokens = 10_000 — tokens reserved for system prompts, per-question instructions, and the model's output
    • context_window_fallback_tokens = 128_000 — conservative fallback used when the active model has no context-window tag
  • views.py:

    • process_llm_async — resolves the active model's context_window tag (set via AZURE_E{n}_TAGS_{TAG}=...,context-window:131072,...) and computes the PDF context budget as model_context_window - headroom; falls back to fallback - headroom with a log warning when the tag is absent; on truncation writes a non-error warning to assay.status_context (visible in the UI) advising the user to upload fewer/shorter files
    • generate_answer — adds a dedicated except BadRequestError block using logger.exception so any API-level rejection is logged with a full traceback rather than silently swallowed
  • tests/test_context_window.py — unit tests for both utilities; integration tests asserting that process_llm_async populates status_context when over-limit (fallback path and per-model path) and leaves it clean when context fits

# views.py — budget derived from per-model context-window tag
_context_budget = model_entry.context_window - config.context_window_headroom_tokens
full_pdf_context, was_truncated = truncate_context_to_token_limit(
    full_pdf_context, _context_budget
)
if was_truncated:
    add_status_context(assay, "Uploaded documents exceeded the available context-window budget …", is_error=False)
    assay.save()

Copilot AI linked an issue Apr 25, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits April 25, 2026 17:13
- Add estimate_token_count() and truncate_context_to_token_limit() helpers
  in filehandling.py (use tiktoken with char-based fallback)
- Add Config.context_window_max_tokens = 100,000 in __init__.py
- In process_llm_async: proactively truncate full_pdf_context and write a
  non-error warning into assay.status_context when truncation occurs
- In generate_answer: handle openai.BadRequestError explicitly with a
  logger.error call so context-length failures are logged, not silently
  swallowed
- Add test_context_window.py with unit tests for the utilities and
  integration tests for the warning behaviour

Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/8064d2c3-5696-4c13-9f7c-6f0d4adf28f1

Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
- Extract _TRUNCATION_SAFETY_MARGIN = 0.95 named constant in filehandling.py
- Log tiktoken fallback in estimate_token_count (ImportError vs generic)
- Use logger.exception (with traceback) for BadRequestError in generate_answer

Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/8064d2c3-5696-4c13-9f7c-6f0d4adf28f1

Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate context window behavior on multiple PDF uploads feat: context-window guard for LLM document context Apr 25, 2026
Copilot AI requested a review from johannehouweling April 25, 2026 17:16
@johannehouweling
Copy link
Copy Markdown
Owner

we now have a context-window flag on the .env model TAGS, so we have access to per-model context-window information.
Hence instead of defining a blanket threshold context-window for context pdfs what we should do is define a headroom for overhead like prompt and instrcutions and then leave context-window minus headroom for the pdf context files..

@johannehouweling
Copy link
Copy Markdown
Owner

@copilot can you update the PR to take into account the newest changes to the repo in #138, we now have context-window information per model accessible in the .env model TAGS. so instead of hard-coding context-window size in init.py we have to define a head-room token size that reservers space in the context window (for prompt, output etc) and the difference we can give to the pdf context

Copilot AI and others added 3 commits April 25, 2026 23:29
…-window-upload

Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
Replace blanket context_window_max_tokens with per-model budget derived
from ModelEntry.context_window (set via the TAGS context-window: key) minus
a configurable headroom. Fall back to context_window_fallback_tokens when
the active model has no context-window tag.

- __init__.py: replace context_window_max_tokens with
  context_window_headroom_tokens (10k) and context_window_fallback_tokens (128k)
- views.py: resolve ModelEntry for llm_model key, compute budget as
  context_window - headroom; fall back to fallback - headroom with a warning
- Merge upstream changes from main (ModelEntry.context_window property,
  context-window KNOWN_TAG_KEYS entry, context_window in llm_signature)
- tests: update integration tests to patch the new config fields and add a
  test that exercises the per-model context-window resolution path

Agent-Logs-Url: https://github.com/johannehouweling/ToxTempAssistant/sessions/b1bcef97-dba0-4c5d-998a-8d322da78b35

Co-authored-by: johannehouweling <95692173+johannehouweling@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 25, 2026

@copilot can you update the PR to take into account the newest changes to the repo in #138, we now have context-window information per model accessible in the .env model TAGS. so instead of hard-codin...

Done in commit 8e9d022. The blanket context_window_max_tokens has been replaced with two new config values:

  • context_window_headroom_tokens = 10_000 — tokens reserved for system prompts, per-question instructions, and the model's output
  • context_window_fallback_tokens = 128_000 — conservative fallback used when the active model has no context-window tag

In process_llm_async, the budget is now computed per-model:

budget = model_entry.context_window - config.context_window_headroom_tokens

If the model has no context-window tag (or the key can't be resolved), it falls back to fallback - headroom and logs a warning. The .env.dummy already shows context-window:131072 as an example, e.g.:

AZURE_E1_TAGS_GPT4O=...,context-window:131072,...

@johannehouweling johannehouweling marked this pull request as ready for review April 25, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context window check

2 participants