Skip to content

miningape/huntress

Repository files navigation

Huntress

Highly scalable job scheduler that can easily be configured to run complex workflows. Currently the "frontend" is a postgres database however complex tasks can still easily be configured. Jobs can be "orchestrated" meaning that they are able to wait for another job to complete before running, this allows for complex flows of data where the order of completion is not necessarily known.

Uses JSON job definitions to determine type of job, schedule as well as other necessary parameters. Uses a highly scalable microservice architecture so it can be scaled to massive workloads.

  • Job Scheduler
    • Materialised Views
    • Web Scraper
  • Orchestrate jobs

Dockerfile is finicky! This is because chromium (used by pipeline-worker / webscraper module) and docker do not play nicely and exposes odd quirks where the underlying architecture (e.g. ARM) can determine if the app can run or not.

Roadmap:

  • orchestration
  • "jobs" / non pipeline
    • Made workers (see below)
  • notifier
  • break into many services
  • scan for specific conditions
  • scan other websites
  • remove dead listings
  • frontend to see statuses / manage jobs
  • Cycle detection (make sure orchestrated jobs do not have an infinite run time - there should always be a start and end and no loops)

Planned Workers:

  • pipeline (streaming data: source -> destination)
  • materialise (refresh complex and large tables in postgres as needed)
  • notify
  • analyse (ai / analytics to find desired data)
  • generic (point at a docker container online)

Planned Integrations For pipeline:

  • Files (pipeline source / destination)
  • Postgres (pipeline source / destination)
  • BoligPortal.dk (pipeline source)
  • BoligZonen.dk (pipeline source)
  • FindBoliger.dk (pipeline source)
  • Generic / simple scraper for anything

Possible Ideas:

  • refactor materialise to generic job
  • docker worker (reads any git repo and runs the dockerfile)
  • [ ]

About

A job scheduler with materialise and webscraping jobs - currently being used to scan Copenhagen's rental market prices

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors