Add SSRF-safe HTTP client with --unsafe-client flag by JulienTant · Pull Request #9 · JulienTant/blogwatcher-cli

JulienTant · 2026-04-03T21:59:10Z

Summary

SSRF protection via DataDog go-secure-sdk — blocks requests to private IPs, loopback, link-local, and cloud metadata endpoints (CWE-918)
Refactored rss.Fetcher and scraper.Scraper as structs holding *http.Client (dependency injection, no global state)
Scanner accepts Fetcher + Scraper as dependencies
ScanBlog now returns hard errors instead of soft error strings
Added --unsafe-client / BLOGWATCHER_UNSAFE_CLIENT flag to disable SSRF protection for local dev
E2e test verifies safe client actually blocks loopback requests
Updated AGENTS.md with "ground claims in evidence" rule

Inspired by Hyaxia/blogwatcher#8 by @mihajlo-jovanovic, reimplemented with dependency injection and DataDog's go-secure-sdk.

Test plan

golangci-lint run passes
gotestsum -- ./... passes (23 tests)
TestSSRFProtection verifies loopback is blocked without --unsafe-client

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Enhanced error handling in feed discovery to return informative error messages for request failures instead of silent failures.
Tests
- Improved test robustness for SSRF protection validation by isolating test environment from caller's shell settings.

- Use DataDog go-secure-sdk for SSRF protection (blocks private IPs, loopback, link-local, cloud metadata endpoints) - Refactor rss.Fetcher and scraper.Scraper as structs holding *http.Client - Scanner accepts Fetcher + Scraper as dependencies (no global state) - Add --unsafe-client / BLOGWATCHER_UNSAFE_CLIENT flag for local dev - ScanBlog now returns error (hard fail, not soft error string) - E2e test verifies safe client blocks loopback requests - Tests use plain http.Client (no SSRF check needed for httptest) - Update AGENTS.md: ground claims in evidence, not guesses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-03T21:59:22Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 29b9e4b4-6d2a-4a85-a442-30a6f4f2f64a

📥 Commits

Reviewing files that changed from the base of the PR and between 75c6119 and cd311c6.

📒 Files selected for processing (2)

e2e/e2e_test.go
internal/rss/rss.go

📝 Walkthrough

Walkthrough

This PR improves SSRF protection testing by sanitizing the process environment to ensure consistent test behavior, and enhances feed discovery error handling by propagating errors instead of silently returning empty values.

Changes

Cohort / File(s)	Summary
SSRF Protection Testing `e2e/e2e_test.go`	Added `filterEnv` helper function to remove environment variables from test context. Updated `TestSSRFProtection` to sanitize the process environment by filtering out `BLOGWATCHER_UNSAFE_CLIENT` before test execution, ensuring consistent safe-client behavior. Adjusted assertion to expect `"failed to fetch feed:"` error message.
Feed Discovery Error Handling `internal/rss/rss.go`	Modified `DiscoverFeedURL` to wrap and return errors for request construction, HTTP request failures, and document parsing failures instead of silently returning empty values. Preserved behavior for non-2xx HTTP status codes (still treated as "not found").

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 With whiskers twitching, I hopped through the code,
Cleaning the paths where our tests once strode.
Error wrapping blooms like clover so sweet,
Now feed discovery's honest and neat! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding SSRF protection with an unsafe-client flag for local development.
Description check	✅ Passed	The description covers all required template sections (Summary and Test plan) with comprehensive details about the changes, testing, and implementation approach.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ssrf-safe-client

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

internal/rss/rss.go (1)

82-108: ⚠️ Potential issue | 🟡 Minor

Errors are silently swallowed in DiscoverFeedURL.

Multiple error paths return ("", nil) instead of propagating the error:

Line 85: http.NewRequestWithContext failure
Line 89: f.client.Do failure
Line 97: Non-2xx HTTP status
Line 102: url.Parse failure
Line 107: goquery.NewDocumentFromReader failure

While "discovery not found" may be a valid soft-failure scenario, silently discarding errors (especially network/SSRF errors from f.client.Do) makes debugging difficult and violates the guideline to never ignore errors.

Consider returning errors for actual failures while returning ("", nil) only for "not found" scenarios:

Proposed fix to distinguish errors from "not found"

 func (f *Fetcher) DiscoverFeedURL(ctx context.Context, blogURL string) (string, error) {
 	req, err := http.NewRequestWithContext(ctx, http.MethodGet, blogURL, nil)
 	if err != nil {
-		return "", nil
+		return "", fmt.Errorf("create request: %w", err)
 	}
 	response, err := f.client.Do(req)
 	if err != nil {
-		return "", nil
+		return "", fmt.Errorf("fetch blog URL: %w", err)
 	}
 	defer func() {
 		if err := response.Body.Close(); err != nil {
 			fmt.Fprintf(os.Stderr, "close: %v\n", err)
 		}
 	}()
 	if response.StatusCode < 200 || response.StatusCode >= 300 {
-		return "", nil
+		return "", nil // Not found is acceptable
 	}
 
 	base, err := url.Parse(blogURL)
 	if err != nil {
-		return "", nil
+		return "", fmt.Errorf("parse blog URL: %w", err)
 	}
 
 	doc, err := goquery.NewDocumentFromReader(response.Body)
 	if err != nil {
-		return "", nil
+		return "", fmt.Errorf("parse HTML: %w", err)
 	}

As per coding guidelines: "Never ignore errors in Go - handle all errors returned by functions."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/rss/rss.go` around lines 82 - 108, The DiscoverFeedURL function
currently swallows all errors (e.g., failures from http.NewRequestWithContext,
f.client.Do, non-2xx responses, url.Parse, and goquery.NewDocumentFromReader) by
returning ("", nil); change it to propagate real failures as errors (return "",
err or a wrapped error) for network/parse/http errors and reserve returning ("",
nil) only for legitimate "feed not found" cases (e.g., when discovery markup is
absent or HTTP 404/204 where you decide not to treat as an error). Update the
error returns in DiscoverFeedURL, ensure f.client.Do errors are returned,
non-2xx responses return a descriptive error (unless explicitly handled as
not-found), and parsing errors from url.Parse and goquery.NewDocumentFromReader
are returned so callers can distinguish real failures from "not found."

internal/scanner/scanner.go (1)

157-193: ⚠️ Potential issue | 🟠 Major

Worker early return on error can cause goroutine leak and deadlock.

When a worker encounters an error (lines 171-174), it sends the error and returns immediately without draining remaining jobs from the channel. Meanwhile, the main goroutine continues sending jobs (lines 181-183). If workers exit early, jobs <- job{...} will block forever since no worker is receiving.

Additionally, remaining workers that finish successfully will send nil to errs, but if one worker exited early, you may not receive the expected number of messages, or you may return an error while other workers are still running.

Proposed fix using errgroup for cleaner cancellation

+import "golang.org/x/sync/errgroup"
+
 func (s *Scanner) ScanAllBlogs(ctx context.Context, db *storage.Database, workers int) ([]ScanResult, error) {
 	blogs, err := db.ListBlogs(ctx)
 	if err != nil {
 		return nil, err
 	}
 	if workers <= 1 {
 		// ... sequential path unchanged ...
 	}

 	type job struct {
 		Index int
 		Blog  model.Blog
 	}
 	jobs := make(chan job)
 	results := make([]ScanResult, len(blogs))
-	errs := make(chan error, workers)
+
+	g, gctx := errgroup.WithContext(ctx)

 	for i := 0; i < workers; i++ {
-		go func() {
-			workerDB, err := storage.OpenDatabase(ctx, db.Path())
+		g.Go(func() error {
+			workerDB, err := storage.OpenDatabase(gctx, db.Path())
 			if err != nil {
-				errs <- err
-				return
+				return err
 			}
 			defer func() {
 				if err := workerDB.Close(); err != nil {
 					fmt.Fprintf(os.Stderr, "close: %v\n", err)
 				}
 			}()
 			for item := range jobs {
-				result, err := s.ScanBlog(ctx, workerDB, item.Blog)
+				result, err := s.ScanBlog(gctx, workerDB, item.Blog)
 				if err != nil {
-					errs <- fmt.Errorf("scan %s: %w", item.Blog.Name, err)
-					return
+					return fmt.Errorf("scan %s: %w", item.Blog.Name, err)
 				}
 				results[item.Index] = result
 			}
-			errs <- nil
-		}()
+			return nil
+		})
 	}

+	// Send jobs in a separate goroutine so we can close channel when context cancels
+	go func() {
+		defer close(jobs)
+		for index, blog := range blogs {
+			select {
+			case jobs <- job{Index: index, Blog: blog}:
+			case <-gctx.Done():
+				return
+			}
+		}
+	}()

-	for index, blog := range blogs {
-		jobs <- job{Index: index, Blog: blog}
-	}
-	close(jobs)
-
-	for i := 0; i < workers; i++ {
-		if err := <-errs; err != nil {
-			return nil, err
-		}
-	}
+	if err := g.Wait(); err != nil {
+		return nil, err
+	}

 	return results, nil
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/scanner/scanner.go` around lines 157 - 193, The worker goroutine
currently returns early on error while the main sender keeps pushing to jobs,
causing deadlocks; replace the manual workers/errs pattern with an errgroup (use
errgroup.WithContext(ctx)) and spawn workers via g.Go so that a single
cancellation stops all workers and the sender; inside each worker (the anonymous
goroutine that currently calls storage.OpenDatabase and s.ScanBlog) open the DB,
defer Close, then loop receiving from jobs with select on ctx.Done() and the
jobs channel (handle ok==false to exit), and return any scan error so the
errgroup cancels the context; in the sender loop that enqueues job{Index, Blog}
select on ctx.Done() as well to stop early and close(jobs) after enqueuing, then
call g.Wait() to collect the first error and return results.

🧹 Nitpick comments (2)

AGENTS.md (1)
1-3: Excellent addition to document agent verification behavior.

The new "Before speaking" section establishes a clear principle: ground claims in evidence rather than speculation. The specific verification methods (go get, read source/docs, write tests) are concrete and actionable, which aligns well with the high-standards philosophy of this file. As per coding guidelines, this properly documents agent responsibilities and capabilities.
📝 Optional: Reduce repetitive sentence structure

The three successive sentences beginning with "If you're unsure" create a repetitive pattern. Consider rewording for variety:
-Never guess. Ground every claim in actual files, docs, or command output. If you're unsure about a dependency's size, run `go get` and check. If you're unsure about an API, read the source or docs. If you're unsure about behavior, write a test. State what you verified and how — don't speculate.
+Never guess. Ground every claim in actual files, docs, or command output. When uncertain about a dependency's size, run `go get` and check. For API questions, read the source or docs. To verify behavior, write a test. State what you verified and how — don't speculate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 1 - 3, The "Before speaking" section has three
consecutive sentences all starting with "If you're unsure" which is repetitive;
edit the paragraph under the "Before speaking" header to vary sentence openings
and tighten flow while preserving the three verification actions (run `go get`
and check, read source/docs, write a test) and the instruction to state what was
verified and how—target the three sentences specifically for rewording so the
meaning stays identical but reads less repetitive.
e2e/e2e_test.go (1)
423-424: Assert on the repo-owned error prefix instead of the SDK's exact wording.

The test currently asserts on "is not authorized" from go-secure-sdk, which can change with dependency updates. Since internal/rss/rss.go wraps fetch failures with "failed to fetch feed:" (line 48), asserting on that repo-owned prefix is more stable and equally validates that SSRF protection is blocking the request.
Diff
-	require.Contains(t, string(out), "is not authorized", "expected SSRF error message")
+	require.Contains(t, string(out), "failed to fetch feed:", "expected SSRF error message")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/e2e_test.go` around lines 423 - 424, Update the assertion in e2e_test.go
to check for the repo-owned error prefix instead of the SDK message: replace the
require.Contains that expects "is not authorized" with one that looks for
"failed to fetch feed:" (the wrapper string emitted by internal/rss/rss.go when
fetch fails). This ensures the test validates the repo's wrapped error from the
fetch helper (see internal/rss/rss.go) and remains stable against upstream SDK
message changes; modify the require.Contains call in the test accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@e2e/e2e_test.go`:
- Around line 411-423: The test spawns subprocesses with cmd :=
exec.CommandContext(...) and currently uses cmd.Env = append(os.Environ(),
"NO_COLOR=1"), which can inherit BLOGWATCHER_UNSAFE_CLIENT from the environment;
update both places where cmd.Env is set (the "add" and "scan" invocations) to
remove or override BLOGWATCHER_UNSAFE_CLIENT (e.g. filter it out from
os.Environ() or explicitly set "BLOGWATCHER_UNSAFE_CLIENT=" or
"BLOGWATCHER_UNSAFE_CLIENT=false" before appending "NO_COLOR=1") so the test
reliably exercises the safe client path when invoking the binaryPath with
dbPath.

In `@internal/scraper/scraper.go`:
- Around line 40-45: The Scraper currently panics when its HTTP client is nil
(e.g., NewScraper(nil)) because ScrapeBlog calls s.client.Do(req); update this
by validating the injected client and failing fast: in NewScraper return an
error if the provided http.Client is nil (or set a default client) and/or add a
nil-check at the start of ScrapeBlog to return a clear ScrapeError (include
context like "nil http client") instead of dereferencing s.client in ScrapeBlog;
reference the Scraper type, NewScraper function, ScrapeBlog method, and the
s.client.Do call when making the change.

---

Outside diff comments:
In `@internal/rss/rss.go`:
- Around line 82-108: The DiscoverFeedURL function currently swallows all errors
(e.g., failures from http.NewRequestWithContext, f.client.Do, non-2xx responses,
url.Parse, and goquery.NewDocumentFromReader) by returning ("", nil); change it
to propagate real failures as errors (return "", err or a wrapped error) for
network/parse/http errors and reserve returning ("", nil) only for legitimate
"feed not found" cases (e.g., when discovery markup is absent or HTTP 404/204
where you decide not to treat as an error). Update the error returns in
DiscoverFeedURL, ensure f.client.Do errors are returned, non-2xx responses
return a descriptive error (unless explicitly handled as not-found), and parsing
errors from url.Parse and goquery.NewDocumentFromReader are returned so callers
can distinguish real failures from "not found."

In `@internal/scanner/scanner.go`:
- Around line 157-193: The worker goroutine currently returns early on error
while the main sender keeps pushing to jobs, causing deadlocks; replace the
manual workers/errs pattern with an errgroup (use errgroup.WithContext(ctx)) and
spawn workers via g.Go so that a single cancellation stops all workers and the
sender; inside each worker (the anonymous goroutine that currently calls
storage.OpenDatabase and s.ScanBlog) open the DB, defer Close, then loop
receiving from jobs with select on ctx.Done() and the jobs channel (handle
ok==false to exit), and return any scan error so the errgroup cancels the
context; in the sender loop that enqueues job{Index, Blog} select on ctx.Done()
as well to stop early and close(jobs) after enqueuing, then call g.Wait() to
collect the first error and return results.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 1-3: The "Before speaking" section has three consecutive sentences
all starting with "If you're unsure" which is repetitive; edit the paragraph
under the "Before speaking" header to vary sentence openings and tighten flow
while preserving the three verification actions (run `go get` and check, read
source/docs, write a test) and the instruction to state what was verified and
how—target the three sentences specifically for rewording so the meaning stays
identical but reads less repetitive.

In `@e2e/e2e_test.go`:
- Around line 423-424: Update the assertion in e2e_test.go to check for the
repo-owned error prefix instead of the SDK message: replace the require.Contains
that expects "is not authorized" with one that looks for "failed to fetch feed:"
(the wrapper string emitted by internal/rss/rss.go when fetch fails). This
ensures the test validates the repo's wrapped error from the fetch helper (see
internal/rss/rss.go) and remains stable against upstream SDK message changes;
modify the require.Contains call in the test accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e320a777-4963-4818-88c5-b9667bd32f88

📥 Commits

Reviewing files that changed from the base of the PR and between 2972f35 and 75c6119.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (11)

AGENTS.md
e2e/e2e_test.go
go.mod
internal/cli/commands.go
internal/cli/root.go
internal/rss/rss.go
internal/rss/rss_test.go
internal/scanner/scanner.go
internal/scanner/scanner_test.go
internal/scraper/scraper.go
internal/scraper/scraper_test.go

e2e/e2e_test.go

coderabbitai · 2026-04-03T22:11:23Z

internal/scraper/scraper.go

+func (s *Scraper) ScrapeBlog(ctx context.Context, blogURL string, selector string) ([]ScrapedArticle, error) {
 	req, err := http.NewRequestWithContext(ctx, http.MethodGet, blogURL, nil)
 	if err != nil {
 		return nil, ScrapeError{Message: fmt.Sprintf("failed to create request: %v", err)}
 	}
-	response, err := client.Do(req)
+	response, err := s.client.Do(req)


⚠️ Potential issue | 🟠 Major

Fail fast when the scraper client is missing.

NewScraper(nil) currently panics later at s.client.Do(req). Since the injected client now controls timeout and SSRF posture, this should return a regular error instead of dereferencing nil.

🐛 Proposed fix

func (s *Scraper) ScrapeBlog(ctx context.Context, blogURL string, selector string) ([]ScrapedArticle, error) { + if s == nil || s.client == nil { + return nil, ScrapeError{Message: "http client is not configured"} + } req, err := http.NewRequestWithContext(ctx, http.MethodGet, blogURL, nil) if err != nil { return nil, ScrapeError{Message: fmt.Sprintf("failed to create request: %v", err)} } response, err := s.client.Do(req)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (s *Scraper) ScrapeBlog(ctx context.Context, blogURL string, selector string) ([]ScrapedArticle, error) {

req, err := http.NewRequestWithContext(ctx, http.MethodGet, blogURL, nil)

if err != nil {

return nil, ScrapeError{Message: fmt.Sprintf("failed to create request: %v", err)}

}

response, err := client.Do(req)

response, err := s.client.Do(req)

func (s *Scraper) ScrapeBlog(ctx context.Context, blogURL string, selector string) ([]ScrapedArticle, error) {

if s == nil || s.client == nil {

return nil, ScrapeError{Message: "http client is not configured"}

}

req, err := http.NewRequestWithContext(ctx, http.MethodGet, blogURL, nil)

if err != nil {

return nil, ScrapeError{Message: fmt.Sprintf("failed to create request: %v", err)}

}

response, err := s.client.Do(req)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/scraper/scraper.go` around lines 40 - 45, The Scraper currently panics when its HTTP client is nil (e.g., NewScraper(nil)) because ScrapeBlog calls s.client.Do(req); update this by validating the injected client and failing fast: in NewScraper return an error if the provided http.Client is nil (or set a default client) and/or add a nil-check at the start of ScrapeBlog to return a clear ScrapeError (include context like "nil http client") instead of dereferencing s.client in ScrapeBlog; reference the Scraper type, NewScraper function, ScrapeBlog method, and the s.client.Do call when making the change.

…v leak - DiscoverFeedURL now propagates real errors (network, context cancel, parse) instead of swallowing them as ("", nil) - Replace scanner worker pool with errgroup to prevent deadlock when a worker returns early on error while sender is still pushing jobs - Filter BLOGWATCHER_UNSAFE_CLIENT from env in SSRF e2e test to ensure the safe client path is always exercised - Check our own wrapped error message in SSRF test instead of SDK string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

JulienTant and others added 3 commits April 3, 2026 15:16

Merge branch 'main' into ssrf-safe-client

e3035a7

Merge main into ssrf-safe-client

cd311c6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JulienTant merged commit dbdcb35 into main Apr 4, 2026
2 checks passed

JulienTant deleted the ssrf-safe-client branch April 4, 2026 00:53

coderabbitai bot mentioned this pull request Apr 4, 2026

Add retry with backoff and per-blog error reporting #16

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SSRF-safe HTTP client with --unsafe-client flag#9

Add SSRF-safe HTTP client with --unsafe-client flag#9
JulienTant merged 4 commits intomainfrom
ssrf-safe-client

JulienTant commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulienTant commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JulienTant commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading