There are specific domains where the crawler and harvesters encounter near-systematic timeouts:
SQL query to list those:
SELECT
domain,
COUNT(*) FILTER (WHERE timeout) AS timeouts,
COUNT(*) AS total_checks,
ROUND(100.0 * COUNT(*) FILTER (WHERE timeout) / COUNT(*), 2) AS pct_timeout
FROM checks
WHERE created_at >= now() - interval '30 days'
AND domain IS NOT NULL
AND domain <> ''
GROUP BY domain
HAVING COUNT(*) >= 100
ORDER BY pct_timeout DESC, timeouts DESC
LIMIT 30;
To do
[ ] Clean up or archive the records/entries associated with these unreachable domains?
[ ] Contact those domains administrators?
There are specific domains where the crawler and harvesters encounter near-systematic timeouts:
SQL query to list those:
To do
[ ] Clean up or archive the records/entries associated with these unreachable domains?
[ ] Contact those domains administrators?