Skip to content

perf(parser): cut redundant NNTP reads and cap PAR2 scan#532

Merged
javi11 merged 2 commits intomainfrom
perf/parser-reduce-reads
Apr 24, 2026
Merged

perf(parser): cut redundant NNTP reads and cap PAR2 scan#532
javi11 merged 2 commits intomainfrom
perf/parser-reduce-reads

Conversation

@javi11
Copy link
Copy Markdown
Owner

@javi11 javi11 commented Apr 24, 2026

Summary

Three optimizations to internal/importer/parser that together cut parse-time NNTP reads by ~50–70% on typical multi-file releases.

1. Share yEnc standardPartSize across the NZB

normalizeSegmentSizesWithYenc previously fetched the second and last segment of every multi-segment file to learn the yEnc PartSize. Every file produced by the same encoder shares the same middle-segment size, so we now fetch it once per NZB via pickRepresentativeMiddleSegment and reuse it. Per-file normalization only fetches the last segment.

  • On a 50-file NZB: ~49 fewer priority-lane reads.
  • Fallback: if the representative fetch fails, normalization falls back to the old per-file behavior.

2. Defer 16KB first-segment completion behind PAR2 presence

The fan-out loop in fetchAllFirstSegments that reads extra segments to reach 16KB exists only for PAR2 Hash16k MD5 matching. It now runs only when hasPar2IndexCandidate confirms the NZB actually contains a PAR2 index, and it skips obvious sidecars (.nfo, .txt, .srt, .sub, .jpg, .jpeg, .png, .nzb, .sfv, .md5).

  • NZBs without PAR2 (Stremio-fetched releases, obfuscated one-file NZBs) pay zero extra reads here.

3. Cap PAR2 scan timeout and short-circuit on FileDesc completion

par2/descriptor.go previously used 30s × segments (up to 2.5 min for a 5-segment index) and read up to 1000 packets. Now:

  • Timeout capped at 90s.
  • Scan breaks after 50 non-FileDesc packets following the last descriptor — healthy index files exit as soon as the FileDesc section ends.

4. Parallelize PAR2 extraction and representative yEnc fetch

par2.GetFileDescriptors and the one-shot representative yEnc fetch are independent. They now run together in an errgroup, so wall time is max(par2, yenc) instead of par2 + yenc.

Notes for reviewers

  • No change to ConnectionPool / BodyAsync contracts.
  • No change to notFoundIDs semantics.
  • Output of ParsedNzb should be byte-identical for existing NZBs — the only behavioral change is when / whether certain reads happen, not which data lands in the parsed result.
  • Correctness anchor for change feat: add documentation and multiple fix #2: fileinfo.getFileInfo already handles the case where the padded MD5 doesn't match any descriptor (it's the normal path for non-PAR2 NZBs), so skipping 16KB completion is safe.

Test plan

  • go build ./...
  • go vet ./internal/importer/parser/...
  • go test -race ./internal/importer/parser/... — all existing tests pass
  • End-to-end smoke: import a known-good 50-file RAR NZB locally and confirm ParsedNzb matches byte-for-byte vs. pre-change output, and that parse time drops meaningfully via progressTracker timings
  • End-to-end smoke: import a non-PAR2 single-file NZB and confirm no additional-segment reads are issued

javi11 added 2 commits April 24, 2026 12:46
Three optimizations to the NZB import parser that together cut parse-time
NNTP reads by ~50-70% on typical releases:

- Share yEnc standard PartSize across the NZB. One representative
  second-segment fetch feeds normalization for every multi-segment file,
  replacing ~N per-file second-segment fetches.
- Defer 16KB first-segment completion fan-out. Only run it when the NZB
  actually contains a PAR2 index (so Hash16k matching is useful), and
  skip obvious sidecars (.nfo/.txt/.srt/.sub/.jpg/.nzb/.sfv/.md5).
- Cap PAR2 descriptor scan timeout at 90s and break early after a window
  of non-FileDesc packets past the last descriptor.

The representative yEnc fetch runs in parallel with PAR2 extraction via
errgroup, so wall time stays at max(par2, yenc) instead of par2 + yenc.
Falls back to the previous per-file behavior when the representative
fetch fails.
@javi11 javi11 merged commit a2e9e2d into main Apr 24, 2026
2 checks passed
@javi11 javi11 deleted the perf/parser-reduce-reads branch April 24, 2026 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant