Skip to content

komadiina/entask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

entask

a university project in online informational systems design. it models a microservice-oriented platform for scalable, durable, and consistent content conversion/transcoding. the system applies durable execution (conductor-oss), pub/sub state models (redis streams), message brokerage (nats jetstream), and an api gateway (traefik) for auth checks, request routing, and per-service load balancing, with the frontend is built in angular v19.

schema

entask-diagram a very rough-looking schema, will evolve as the project itself evolves. currently, it only serves a development guidance, so i dont steer off into the unknown.

usage

deployment is supported via docker swarm (todo hihi) or docker compose.

prerequisites:

  • docker
  • docker model runner
  • atleast 16gb of working memory (for all 4 services up and running and not turning your pc into a ticking time bomb)

installation:

  1. clone the repo (git clone https://github.com/komadiina/entask.git)
  2. initialize .env with all required values (user/password credentials, hosts, ports)
  3. modify config files in /core/* according to set envvars (still havent migrated them into a generation script)
  4. run with docker compose up (use -d for detached)
  5. stop with docker compose down -v --remove-orphans
  • if any package.json or requirements.txt changes haven't reflected, pass the --force-recreate flag into docker compose up
  1. text-recognizer converter requires an instantiated docker model (i.e. ai/gemma3:latest, ai/gemma3-qat:latest)
    1. go to docker desktop
    2. navigate to Models (BETA) tab
    3. download any model (gemma3 should suffice, if low on resources you can use any other smaller-form quantized models)
    4. since it is running outside of the entask network, requests will need to target the docker internal network (model-runner.docker.internal, see DOCKER_MODEL_RUNNER_LISTEN)
    5. (note) to enable GPU inference, see official docs

envfile

below is a list of some important env-vars. default.dev.env for more info.

envvar description
PGADMIN_EMAIL use this email with POSTGRES_PASSWORD to log into pgAdmin console
FRONTEND_HOST hardcoded, used for client-side redirects (0.0.0.0 does not work here)
CLIENT_SECRET_FILE your Google API Client secrets file
GOOGLE_OAUTH_CLIENT_ID extracted from the secret file or via the Google Cloud console
GOOGLE_KEYS_URL public Google endpoint for fetching public keys (if provider == 'google')
DOCKER_MODEL_RUNNNER_LISTEN docker model runner host/listen, used as a local LLM host
LLM_SERVICE_SYSTEM_PROMPT characterize your LLM model w/ harsh instructions
... others are pretty self-explanatory

default hosts/listens

take note that some require authenticated URLs (user:pass@host:port):

docker-service listen
MinIO minio:9000
NATS 0.0.0.0:{4222, 6222, 8222}
PostgreSQL postgres:5432
pgBouncer pgbouncer:6432
Redis redis:6379
Traefik 0.0.0.0:{80, 443, 8080}
Conductor conductor-server:{5000, 8080}
angular client frontend:4200
auth-service auth-service:5201
user-details-service user-details-service:5202
file-service file-service:5204
conversion-service conversion-service:5205
notifier-service notifier-service:5206
llm-service llm-service:5207
thumbnailer-converter thumbnailer-converter:7401
waveformer-converter waveformer-converter:7402
term-extractor-converter term-extractor-converter:7403
text-recognizer-converter text-recognizer-converter:7404
ws-proxy ws-proxy:9202

components

review TODO for roadmap, issues & etc

converters

all converters use minio and httpx for file-based communication (no ftp yet)


thumbnailer uses:

  • ffmpeg & imageio
  • moviepy flow-thumbnailer

waveformer uses:

  • pedalboard flow-waveformer

term-extractor uses:

  • sentence-transfomers
  • spacy flow-term-extractor

text-recognizer uses:

  • easyocr
  • pyspellchecker
  • openai
  • fpdf2 flow-text-recognizer

About

a microservice-oriented data conversion platform

Resources

Stars

Watchers

Forks

Contributors