Skip to content

[hydra] Enquêter sur la fréquence de crawl de certaines ressources non modifiées #411

@maudetes

Description

@maudetes

Il y a des exemples de ressources crawlées tous les jours quand bien même elles ne changent pas.

Par exemple, cette ressource dont la date de dernière modification est la même depuis 2025-06-02.

Pistes

  1. Il s'agit d'une ressource moissonnée, dont la détection de modification se fait via la métadonnée harvest_modified_at en DB.
  2. detected_last_modified_at n'est pas rempli à chaque fois dans le check apparemment :
hydra-hydra=> select created_at, detected_last_modified_at, next_check_at from checks where resource_id='51f43b0e-0c0a-4e26-802d-099c92237fea' order by created_at desc limit 30;
          created_at           | detected_last_modified_at |         next_check_at         
-------------------------------+---------------------------+-------------------------------
 2026-03-26 09:59:19.66375+01  | 2025-06-02 13:34:23+02    | 2026-03-26 23:00:43.202917+01
 2026-03-25 09:58:58.79402+01  | 2025-06-02 13:34:23+02    | 2026-03-27 18:03:01.272372+01
 2026-03-24 21:49:50.613426+01 | 2025-06-02 13:34:23+02    | 2026-03-27 12:15:47.233965+01
 2026-03-24 09:49:31.667101+01 |                           | 2026-03-24 21:49:31.665276+01
 2026-03-23 09:48:52.076539+01 | 2025-06-02 13:34:23+02    | 2026-03-24 09:48:52.075813+01
 2026-03-22 21:46:57.131644+01 | 2025-06-02 13:34:23+02    | 2026-03-23 09:47:00.738925+01
 2026-03-22 09:28:54.660618+01 |                           | 2026-03-22 21:28:54.659371+01
 2026-03-15 09:25:40.962614+01 | 2025-06-02 13:34:23+02    | 2026-03-22 09:25:40.961469+01
 2026-03-14 09:22:03.155941+01 | 2025-06-02 13:34:23+02    | 2026-03-15 09:22:03.153909+01
 2026-03-13 21:21:27.168271+01 | 2025-06-02 13:34:23+02    | 2026-03-14 09:21:31.178236+01
 2026-03-13 09:18:41.672842+01 |                           | 2026-03-13 21:18:41.665959+01
 2026-03-12 21:18:07.3616+01   | 2025-06-02 13:34:23+02    | 2026-03-13 09:18:10.985572+01
 2026-03-12 09:17:10.667508+01 |                           | 2026-03-12 21:17:10.665704+01
 2026-02-10 09:15:31.996792+01 | 2025-06-02 13:34:23+02    | 2026-03-12 09:15:31.995633+01
 2026-02-03 09:13:50.54324+01  | 2025-06-02 13:34:23+02    | 2026-02-10 09:13:50.541999+01
 2026-02-01 20:50:25.371683+01 | 2025-06-02 13:34:23+02    | 2026-02-03 09:11:08.258191+01
 2026-02-01 08:37:37.123381+01 |                           | 2026-02-01 20:37:37.121973+01
 2026-01-31 20:35:56.700192+01 | 2025-06-02 13:34:23+02    | 2026-02-01 08:36:05.252104+01
 2026-01-31 08:35:31.123692+01 |                           | 2026-01-31 20:35:31.122142+01
 2026-01-30 08:34:53.639138+01 | 2025-06-02 13:34:23+02    | 2026-01-31 08:34:53.637514+01
 2026-01-29 20:04:31.900053+01 | 2025-06-02 13:34:23+02    | 2026-01-30 08:34:21.31749+01
 2026-01-29 08:03:45.11962+01  |                           | 2026-01-29 20:03:45.118208+01
 2026-01-28 18:57:15.14563+01  | 2025-06-02 13:34:23+02    | 2026-01-29 06:57:18.688821+01
 2026-01-28 06:57:05.130972+01 |                           | 2026-01-28 18:57:05.121627+01
 2026-01-27 06:54:51.774682+01 | 2025-06-02 13:34:23+02    | 2026-01-28 06:54:51.772775+01
 2026-01-26 18:54:08.35111+01  | 2025-06-02 13:34:23+02    | 2026-01-27 06:54:11.809474+01
 2026-01-26 06:53:48.126232+01 |                           | 2026-01-26 18:53:48.124631+01
 2026-01-25 18:51:13.936244+01 | 2025-06-02 13:34:23+02    | 2026-01-26 06:51:41.563942+01
 2026-01-25 06:49:37.122641+01 |                           | 2026-01-25 18:49:37.121599+01
 2026-01-24 18:46:44.271838+01 | 2025-06-02 13:34:23+02    | 2026-01-25 06:46:46.12746+01

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

Status

🛠 Doing

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions