Skip to content

Token Safeguards and Read Tool Modernization#186

Open
ericproulx wants to merge 1 commit intomainfrom
token_safeguard
Open

Token Safeguards and Read Tool Modernization#186
ericproulx wants to merge 1 commit intomainfrom
token_safeguard

Conversation

@ericproulx
Copy link
Collaborator

@ericproulx ericproulx commented Jan 27, 2026

Summary

This PR introduces token-based safeguards for file reading to prevent AI agents from accidentally consuming their entire context window with large files. By enforcing a strict token limit, we force agents to use pagination (reading in subsets) or search tools when dealing with large files, ensuring they maintain enough context for reasoning and other tasks.

Key Changes

🛡️ Token Safeguards

  • Context Preservation: Replaced read_line_limit (2000 lines) and line_character_limit (2000 chars/line) with read_max_tokens (default 25,000 tokens).
  • Proactive Validation: The Read tool now checks token count before returning content. If the content exceeds the limit, it rejects the request with a helpful error guiding the agent to:
    • Use offset and limit parameters to read file in chunks
    • Use Grep tool to search for specific content

📖 Read Tool Overhaul

  • Raw Content Output: Removed line numbers from Read tool output. Now returns raw file content, simplifying editing and copy-pasting.
  • Malicious File Warning: Added system reminder warning agents to check for malicious code.
  • Binary File Handling: Improved detection and clearer error messages for unsupported binary formats.
  • Consistent Binary Handling: ReadTracker now reads files the same way as Read tool (UTF-8 first, fall back to binary).

🧠 Memory & Skills

  • MemoryRead: Removed line numbers from output (previously used separator). Now returns raw content.
  • LoadSkill: Removed line numbers from output. Now returns raw skill content.
  • ScratchpadRead: Removed line numbers from output. Now returns raw content.

🔧 Tool Detection & Descriptions

  • Context Manager: Updated tool detection to recognize Read output by system reminder instead of line numbers (which are now removed).
  • Edit Tools: Updated Edit and MultiEdit descriptions to remove references to line number prefixes.

Configuration Updates

  • Added: SwarmSDK.config.read_max_tokens (default: 25,000 tokens)
  • Removed: SwarmSDK.config.read_line_limit (was 2000 lines)
  • Removed: SwarmSDK.config.line_character_limit (was 2000 chars)

Documentation Updates

  • Updated configuration reference with new read_max_tokens parameter
  • Removed line number formatting from memory technical details

Test Coverage

  • Read tool tests updated for raw output and token safeguards
  • MemoryRead tests updated for raw output format
  • LoadSkill tests updated for raw output format
  • ScratchpadRead tests updated (implicit via integration)
  • Config tests updated for new parameter
  • ContextManager tests updated for new tool detection
  • ReadTracker functionality preserved with consistent file reading
  • Virtual entries and skills integration tests updated

@ericproulx ericproulx force-pushed the token_safeguard branch 3 times, most recently from 4423b67 to 5eb699e Compare January 27, 2026 14:11
Refactor Read tool to use token counting instead of line/character limits for better context management. This provides more accurate control over content size sent to LLMs.

Changes:
- Replace read_line_limit and line_character_limit config with read_max_tokens (default 25,000)
- Add token_safeguard method to Read tool for enforcing token limits
- Update ReadTracker to consistently read file content matching Read tool behavior
- Improve tool detection in ContextManager using system reminder patterns
- Update documentation and all related tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments