Add retry with backoff and per-blog error reporting by JulienTant · Pull Request #16 · JulienTant/blogwatcher-cli

JulienTant · 2026-04-04T21:51:13Z

Summary

HTTP fetches (feed discovery, feed parse, scrape) now retry up to 3 times with exponential backoff via cenkalti/backoff/v5
ScanAllBlogs collects per-blog errors into ScanResult.Error instead of failing fast — successful scans are no longer lost when one blog is temporarily unreachable
Scan output shows per-blog errors in red and a summary line when failures occur

Test plan

4 new unit tests: retry success, retry exhaustion, partial failure (concurrent + sequential)
1 new e2e test: TestScanPartialFailure with golden file
All 55 tests pass (existing + new)
golangci-lint reports 0 issues
CodeRabbit review: no findings

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Retries with exponential backoff for network operations to improve resilience.
- Scan now reports per‑blog success/failure counts and continues when individual blogs fail; silent mode and exit-code behavior adjusted for partial/total failures.
Bug Fixes
- Server (5xx) responses are treated as retryable/transient errors, preventing silent data loss and enabling retries.
Tests
- Added extensive end-to-end and unit tests for retries, partial/total failure semantics, and context cancellation.

Transient HTTP failures (timeouts, 503s) now retry up to 3 times with exponential backoff via cenkalti/backoff/v5. ScanAllBlogs collects per-blog errors instead of failing fast, so successful scans are not lost when one blog is temporarily unreachable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-04T21:51:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7beb9bff-890f-45cc-8ee7-d6f695ac9609

📥 Commits

Reviewing files that changed from the base of the PR and between ed53cb9 and 8396f1d.

📒 Files selected for processing (1)

internal/cli/commands.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/cli/commands.go

📝 Walkthrough

Walkthrough

Adds exponential-backoff retries for network operations in the scanner, records per-blog errors in a new ScanResult.Error field, changes scanning to continue on non-fatal per-blog failures, and updates CLI scan reporting and exit semantics to reflect partial vs total failures. New tests and a backoff dependency were added.

Changes

Cohort / File(s)	Summary
Scanner core & tests `internal/scanner/scanner.go`, `internal/scanner/scanner_test.go`	Introduce `retryHTTP` with exponential backoff and retry constants; add `ScanResult.Error`; wrap discover/parse/scrape calls with retries; treat non-fatal per-blog errors as results (don’t fail-fast); add tests for retries, partial failures, and context cancellation.
RSS feed discovery `internal/rss/rss.go`, `internal/rss/rss_test.go`	Treat HTTP 5xx responses from feed discovery as retryable errors (return `FeedParseError`), while 4xx remains “no feed”; add tests for 503 (error) and 404 (no feed).
CLI reporting `internal/cli/commands.go`	Count per-blog failures using `result.Error`; exclude errored results from new-article totals; short-circuit per-blog printing on error; print succeeded/failed summary; in silent mode return error only when all blogs fail.
End-to-end tests `e2e/e2e_test.go`	Adjust `TestSSRFProtection` expectations for partial-success behavior; add `TestScanPartialFailure`, `TestScanSilentPartialFailure`, and `TestScanSilentTotalFailure` using `httptest` servers to validate partial/total failure semantics.
Module dependencies `go.mod`	Add `github.com/cenkalti/backoff/v5 v5.0.3` for backoff logic.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI: scan command
    participant Scanner as Scanner
    participant Retry as retryHTTP
    participant HTTP as Remote Feed (HTTP)
    participant DB as Repo/Results

    CLI->>Scanner: ScanAllBlogs(ctx, blogs)
    loop per blog
        Scanner->>Retry: retryHTTP(discover/parse/scrape)
        Retry->>HTTP: HTTP request
        alt transient 5xx
            HTTP-->>Retry: 5xx
            Retry->>Retry: backoff + retry
            Retry->>HTTP: retry request
            HTTP-->>Retry: 200 (success)
            Retry-->>Scanner: Success (data)
            Scanner->>DB: Record result (Error empty, NewArticles...)
        else persistent failure or retries exhausted
            HTTP-->>Retry: repeated 5xx / failure
            Retry-->>Scanner: Error ("failed to fetch feed")
            Scanner->>DB: Record result (Error set)
        end
    end
    Scanner-->>CLI: []ScanResult (includes Error fields)
    CLI->>CLI: Print per-blog output and summary (X succeeded, Y failed)
    alt silent mode and all failed
        CLI-->>CLI: exit non-zero
    else
        CLI-->>CLI: exit 0
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Code quality: Go 1.26.1, linting, proper error handling, testify #3 — Related modifications to internal/scanner/scanner.go error handling and scan/database control flow.
Add SSRF-safe HTTP client with --unsafe-client flag #9 — Related updates to internal/rss.DiscoverFeedURL behavior and e2e expectations for fetch errors.
Add e2e tests and switch CI to mise #7 — Related e2e and scanner test adjustments around partial-failure semantics and ScanResult reporting.

Poem

🐰 Hopping retries with gentle cheer,
Three tries I give each distant tier.
Some feeds will fail, some feeds will sing,
I count the wins and what they bring.
Carrots, backoff, and a hopeful spring.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: adding HTTP retry logic with backoff and per-blog error reporting in scan output.
Description check	✅ Passed	The description covers the required sections with implementation details, test plan with checkboxes, and verification results, though slightly informal with the Claude Code footer.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-agent-a6656e19

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

internal/scanner/scanner.go (1)
61-69: ⚠️ Potential issue | 🟠 Major

Feed auto-discovery still won't retry transient 5xxs.

retryHTTP only retries when DiscoverFeedURL returns an error, but internal/rss/rss.go currently maps every non-2xx discovery response to ("", nil). A homepage that briefly returns 500 is therefore treated as “no feed found” on the first attempt, so blogs that rely on auto-discovery can still fall through to scraping or Source == "none" without any retry. Please surface retryable discovery statuses as errors before this wrapper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/scanner/scanner.go` around lines 61 - 69, The DiscoverFeedURL call
is being wrapped by retryHTTP but non-2xx HTTP discovery responses are currently
returned as ("", nil) in internal/rss/rss.go so retryHTTP never retries
transient 5xxs; change DiscoverFeedURL (or the response handling in
internal/rss/rss.go) to return a retryable error for transient HTTP statuses
(e.g., 5xx and other retryable codes) instead of ("", nil) so that retryHTTP and
the call site using s.fetcher.DiscoverFeedURL will retry; ensure the returned
error type/message makes it clear it’s a transient HTTP status while preserving
the existing "no feed found" behavior for true 2xx/404 cases.
internal/cli/commands.go (1)
189-223: ⚠️ Potential issue | 🟠 Major

--silent hides partial and total scan failures.

With the new exit-0 partial-failure flow, this is now the only path that surfaces result.Error. In --silent, even an all-failed scan ends as scan done with no error output, so automation has no signal that anything went wrong. Please emit at least a failure summary to stderr in silent mode, and consider returning non-zero when failed == len(results).

As per coding guidelines, "Never ignore errors. For production errors, log them."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cli/commands.go` around lines 189 - 223, The silent path currently
swallows all scan failures; after calling sc.ScanAllBlogs and iterating results,
when silent is true still compute failed and totalNew from results (as done in
the non-silent branch) and emit a concise failure summary to stderr using
fmt.Fprintln(os.Stderr, ...) (referencing results, result.Error and totalNew),
and if failed == len(results) return a non-zero error (e.g. return
fmt.Errorf("scan failed: %d/%d blogs failed", failed, len(results))) so
automation can detect total failure; keep the existing "scan done" message only
for fully-successful or partial-success cases.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/scanner/scanner.go`:
- Around line 171-175: The code currently converts every error from s.ScanBlog
into a per-blog ScanResult error, but storage failures and context cancellations
from calls like db.UpdateBlog, db.GetExistingArticleURLs, db.AddArticlesBulk,
and db.UpdateBlogLastScanned should abort the overall scan instead of being
folded into result.Error; change the error handling where s.ScanBlog is called
(the blocks around the current result, scanErr handling) to detect and propagate
fatal errors—specifically treat context.Canceled and context.DeadlineExceeded
and any DB/storage-specific error types/markers as non-recoverable by returning
the error (or re-raising it) rather than setting result.Error, while continuing
to fold only fetch/feed/scrape-related errors into result.Error and
result.BlogName.

---

Outside diff comments:
In `@internal/cli/commands.go`:
- Around line 189-223: The silent path currently swallows all scan failures;
after calling sc.ScanAllBlogs and iterating results, when silent is true still
compute failed and totalNew from results (as done in the non-silent branch) and
emit a concise failure summary to stderr using fmt.Fprintln(os.Stderr, ...)
(referencing results, result.Error and totalNew), and if failed == len(results)
return a non-zero error (e.g. return fmt.Errorf("scan failed: %d/%d blogs
failed", failed, len(results))) so automation can detect total failure; keep the
existing "scan done" message only for fully-successful or partial-success cases.

In `@internal/scanner/scanner.go`:
- Around line 61-69: The DiscoverFeedURL call is being wrapped by retryHTTP but
non-2xx HTTP discovery responses are currently returned as ("", nil) in
internal/rss/rss.go so retryHTTP never retries transient 5xxs; change
DiscoverFeedURL (or the response handling in internal/rss/rss.go) to return a
retryable error for transient HTTP statuses (e.g., 5xx and other retryable
codes) instead of ("", nil) so that retryHTTP and the call site using
s.fetcher.DiscoverFeedURL will retry; ensure the returned error type/message
makes it clear it’s a transient HTTP status while preserving the existing "no
feed found" behavior for true 2xx/404 cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67fe71f9-c6ca-4350-9e1f-0c6c4e30f0c4

📥 Commits

Reviewing files that changed from the base of the PR and between a7b3f82 and 2dec123.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

e2e/e2e_test.go
go.mod
internal/cli/commands.go
internal/scanner/scanner.go
internal/scanner/scanner_test.go

internal/scanner/scanner.go

…discovery - DiscoverFeedURL now returns a retryable error for 5xx responses instead of silently treating them as "no feed found" - ScanAllBlogs propagates context cancellation and DB errors as fatal instead of folding them into per-blog result.Error - Silent scan mode reports failures to stderr and exits non-zero on total failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/rss/rss.go`:
- Around line 99-104: DiscoverFeedURL currently only returns FeedParseError for
5xx responses; expand this so transient failures (HTTP 408, 429 and any errors
from client.Do or reading the body) also return a FeedParseError that wraps the
underlying error so callers using rss.IsFeedError see them as fatal; modify
DiscoverFeedURL to return FeedParseError for response.StatusCode >=500 OR
response.StatusCode == 408 || response.StatusCode == 429, and when client.Do or
ioutil.ReadAll (or equivalent) returns an error wrap that error in
FeedParseError (preserving the original error via fmt.Errorf("%w", err) or
similar) so timeouts/cancellations propagate correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 185fcde4-4619-469f-8ea0-214a960a413a

📥 Commits

Reviewing files that changed from the base of the PR and between 2dec123 and ed53cb9.

📒 Files selected for processing (6)

e2e/e2e_test.go
internal/cli/commands.go
internal/rss/rss.go
internal/rss/rss_test.go
internal/scanner/scanner.go
internal/scanner/scanner_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

internal/scanner/scanner_test.go
internal/cli/commands.go

coderabbitai · 2026-04-04T22:28:21Z

internal/rss/rss.go

+	if response.StatusCode >= 500 {
+		return "", FeedParseError{Message: fmt.Sprintf("discover feed: server error status %d", response.StatusCode)}
+	}
 	if response.StatusCode < 200 || response.StatusCode >= 300 {
-		// Not-found / bad status is not an error — just means no feed at this URL.
+		// Client errors (4xx) are not transient — just means no feed at this URL.
 		return "", nil


⚠️ Potential issue | 🟠 Major

Broaden retryable discovery failures beyond 5xx.

Only the 5xx path becomes a typed FeedParseError here. DiscoverFeedURL still returns plain wrapped errors for client.Do / body-read failures elsewhere in this function, and 408/429 still fall through to ("", nil). Because internal/scanner/scanner.go Lines 45-56 only folds rss.IsFeedError into ScanResult.Error, a single blog without a stored FeedURL can still abort scan all or be silently treated as “no feed found.” Please surface transient discovery failures through a typed error that preserves the underlying cause so cancellation/timeouts remain fatal.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/rss/rss.go` around lines 99 - 104, DiscoverFeedURL currently only returns FeedParseError for 5xx responses; expand this so transient failures (HTTP 408, 429 and any errors from client.Do or reading the body) also return a FeedParseError that wraps the underlying error so callers using rss.IsFeedError see them as fatal; modify DiscoverFeedURL to return FeedParseError for response.StatusCode >=500 OR response.StatusCode == 408 || response.StatusCode == 429, and when client.Do or ioutil.ReadAll (or equivalent) returns an error wrap that error in FeedParseError (preserving the original error via fmt.Errorf("%w", err) or similar) so timeouts/cancellations propagate correctly.

# Conflicts: # internal/cli/commands.go

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

internal/scanner/scanner.go Show resolved Hide resolved

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into worktree-agent-a6656e19

8396f1d

# Conflicts: # internal/cli/commands.go

JulienTant merged commit f16b636 into main Apr 5, 2026
2 checks passed

JulienTant deleted the worktree-agent-a6656e19 branch April 5, 2026 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retry with backoff and per-blog error reporting#16

Add retry with backoff and per-blog error reporting#16
JulienTant merged 3 commits intomainfrom
worktree-agent-a6656e19

JulienTant commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulienTant commented Apr 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JulienTant commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 4, 2026 •

edited

Loading