Fix: export OOM on large crawls — add streaming export endpoint by liquidpurple · Pull Request #48 · PhialsBasement/LibreCrawl

liquidpurple · 2026-03-21T22:35:01Z

The current export flow has a 4× memory multiplication that causes OOM kills
on memory-constrained instances:

Frontend fetches ALL URLs from /api/crawl_status (full data in browser)
Frontend sends ALL data back in POST body to /api/export_data
Backend generates full export string in memory
Backend wraps in jsonify() JSON envelope
Frontend parses JSON, extracts content, creates Blob

For a crawl of ~2200 URLs (~45MB in SQLite), this consistently OOMs on a
2GB RAM instance with systemd MemoryMax.

This PR adds a streaming GET endpoint (/api/export_stream) that uses Python
generators to yield CSV/JSON/XML rows one at a time via Flask Response.
The browser downloads the file directly — no JSON envelope, no Blob,
no round-trip of all data.

The old /api/export_data endpoint is preserved for backward compatibility.

Changes:

main.py: new /api/export_stream endpoint + 8 streaming generator functions
web/static/js/app.js: exportData() rewritten to use streaming endpoint

liquidpurple added 2 commits March 21, 2026 15:13

Add streaming export endpoint to main.py

3f9215c

Replace exportData() with streaming download

65b1866

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: export OOM on large crawls — add streaming export endpoint#48

Fix: export OOM on large crawls — add streaming export endpoint#48
liquidpurple wants to merge 2 commits intoPhialsBasement:mainfrom
liquidpurple:feature/streaming-export

liquidpurple commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liquidpurple commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant