Some agentic coding tools may automatically clean up sessions older than 30 days. I've confirmed Claude Code and Gemini CLI do so by default. We may tell users to configure these tools not to do so, which helps preserve more data.
Or we may incrementally collect data in dataclaw and make a cron job to periodically do so. But for now I think incremental processing is not strctly needed for data < 100 MB.