Skip to content

fix(web): ensure fresh URL fetch context reaches model#208

Open
punishell wants to merge 4 commits intosipeed:mainfrom
punishell:fix/web-fetch-fresh-url
Open

fix(web): ensure fresh URL fetch context reaches model#208
punishell wants to merge 4 commits intosipeed:mainfrom
punishell:fix/web-fetch-fresh-url

Conversation

@punishell
Copy link

Make web_fetch return the fetched payload to the LLM (not just metadata), fix HTML text extraction regex replacements, and add a prompt rule requiring fresh web_fetch calls for URL visit/check requests to prevent stale memory answers.

The bot was constantly hallucinating when asked e.g to visit the website

Make web_fetch return the fetched payload to the LLM (not just metadata), fix HTML text extraction regex replacements, and add a prompt rule requiring fresh web_fetch calls for URL visit/check requests to prevent stale memory answers.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Leeaandrob
Copy link
Collaborator

@Zepan Ensures a fresh URL fetch context reaches the model — fixes web_fetch tool breaking silently when context is reused.

Recommendation: Merge. +8/-6, small correctness fix for web tools.

Copy link
Collaborator

@Leeaandrob Leeaandrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #208 Deep Review — @punishell

Hey @punishell, thanks for diagnosing this. The root cause is spot-on: ForLLM was sending only metadata ("Fetched 5000 bytes from ...") but never the actual page content. The LLM literally could not see what was fetched, so it hallucinated the page content. This PR correctly fixes that.


Verification

  • Checked out branch locally: fix/web-fetch-fresh-url
  • go vet ./pkg/tools/... ./pkg/agent/... — clean
  • go test ./pkg/tools/... -run "Web|Fetch|Search" — all 10 tests passing
  • go test ./pkg/agent/... — all 10 tests passing
  • Traced the ForLLMcontentForLLM → tool result message path (agent/loop.go:551-558) — confirmed ForLLM is what the model receives as the tool response
  • Verified ReplaceAllLiteralString("") vs ReplaceAllString("") produce identical results when replacement is empty string (no $ expansion possible)

Summary of Findings

HIGH (Should Fix)

  • H1: Existing test TestWebTool_WebFetch_Success now passes by coincidence, not by intent

MEDIUM

  • M1: ReplaceAllLiteralStringReplaceAllString is a no-op change (adds diff noise)

LOW

  • L1: System prompt rule is English-only

POSITIVE

  1. Critical bug fixForLLM was only sending metadata, never the actual content. web_fetch was fundamentally broken.
  2. System prompt defense-in-depth rule is a reasonable pattern to prevent hallucination
  3. Small, focused changeset — exactly what's needed, nothing more
  4. Content extraction and truncation logic is correctly preserved

Verdict: REQUEST CHANGES

What needs to change before merge:

  • Update TestWebTool_WebFetch_Success (web_test.go:39-42) to validate the new ForLLM format (see H1)
  • Remove the no-op ReplaceAllLiteralStringReplaceAllString changes, or justify why they're needed (see M1)

Estimated effort: ~30 minutes. Happy to re-review.

Use URL-presence rules instead of English-only keywords, and inject a runtime system note when user messages contain URLs so responses are based on fresh web_fetch results in the same turn.
Copy link
Collaborator

@Leeaandrob Leeaandrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #208 Re-Review (Post Commit ed0a2e8)

Verdict: REQUEST_CHANGES

@punishell — Thanks for addressing the earlier feedback. The no-op ReplaceAllLiteralString changes are gone, and the English-only keyword approach was replaced with regex-based URL detection. However, the new commit introduces a critical false-positive problem with the URL regex that makes this worse than the original state for many common conversations.


What was fixed (from previous review)

  • No-op ReplaceAllLiteralStringReplaceAllString changes removed — Clean diff now.
  • English-only keyword list replaced — The new approach uses URL regex detection, which is language-agnostic.
  • Core bug fix intactForLLM: string(resultJSON) correctly sends actual content to the model.
  • All tests passgo test -race ./pkg/tools/... ./pkg/agent/... both PASS.
  • go vet clean

New Findings

[HIGH] URL regex has severe false positives — matches filenames, JS libraries, and PicoClaw's own files

File: pkg/agent/loop.go:48

var userMessageURLRegex = regexp.MustCompile(`(?i)(https?://[^\s<>"')\]]+|\b[a-z0-9][a-z0-9.-]*\.[a-z]{2,}\b(?:/[^\s<>"')\]]*)?)`) 

The bare-domain pattern \b[a-z0-9][a-z0-9.-]*\.[a-z]{2,}\b matches ANY word.extension where extension is 2+ characters. Tested locally:

Input Match Problem
config.json ✅ MATCH Every config discussion triggers it
MEMORY.md ✅ MATCH PicoClaw's own workspace file!
node.js ✅ MATCH Extremely common in tech conversations
vue.js ✅ MATCH Common JS library
three.js ✅ MATCH Common JS library
my-app.service ✅ MATCH Systemd service discussions
https://example.com ✅ MATCH Correct
github.com/sipeed ✅ MATCH Correct

When a false positive fires, the system injects:

[System note: This message includes URL(s). Before answering, you must call web_fetch for each URL...]

The model then tries to web_fetch("config.json") or web_fetch("MEMORY.md"), which fails and wastes tool calls. This makes the user experience worse than without the feature for any tech conversation mentioning filenames.

Fix: Only match https?:// URLs (not bare domains). The bare domain pattern causes far more harm than good:

var userMessageURLRegex = regexp.MustCompile(`https?://[^\s<>"')\]]+`)

If bare domain support is desired, require at least a TLD from a known list (.com, .org, .io, .dev, etc.) rather than any 2+ char extension.


[MEDIUM] No tests for enforceFreshWebFetchHint

File: pkg/agent/loop.go:281-289

The new function has zero test coverage. Given the regex issues above, tests are especially important to catch regressions. At minimum:

func TestEnforceFreshWebFetchHint(t *testing.T) {
    // Should inject note for real URLs
    assert.Contains(t, enforceFreshWebFetchHint("check https://example.com"), "[System note")
    // Should NOT inject note for filenames
    assert.Equal(t, "edit config.json", enforceFreshWebFetchHint("edit config.json"))
    // Should NOT inject note for plain text
    assert.Equal(t, "hello world", enforceFreshWebFetchHint("hello world"))
}

[LOW] System prompt rule #3 and the enforceFreshWebFetchHint are redundant

The PR adds URL enforcement in two places:

  1. System prompt rule #3 in context.go — tells the model generically to fetch URLs
  2. enforceFreshWebFetchHint in loop.go — detects URLs and injects a system note per-message

Both serve the same purpose. The per-message injection (#2) is more forceful and makes #1 somewhat redundant. Consider keeping only one approach to avoid double-prompting the model.


Summary

Severity Count Details
CRITICAL 0
HIGH 1 URL regex false positives (config.json, MEMORY.md, node.js, etc.)
MEDIUM 1 No tests for enforceFreshWebFetchHint
LOW 1 Redundant enforcement in system prompt + per-message injection
POSITIVE 1 Core bug fix (ForLLM now sends actual content) is correct and valuable

The core fix in web.go (line 407) is valuable and correct. The problem is entirely in the new URL detection logic in loop.go. I'd recommend:

  1. Simplify the regex to only match https?:// URLs (drop bare domain matching)
  2. Add unit tests for enforceFreshWebFetchHint
  3. Consider removing the redundant system prompt rule #3 since the per-message injection is more reliable

punishell and others added 2 commits February 17, 2026 23:14
Co-authored-by: Leandro Barbosa <leandrobar93@gmail.com>
Add unit tests for enforceFreshWebFetchHint, narrow URL detection to explicit http/https links to avoid filename false positives, and remove the redundant global system-prompt rule in favor of runtime enforcement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants