Introduction to workflow orchestration with Kestra
In this repo, I will explore Kestra, a workflow orchestration tool to write an ETL pipeline to ingest NY Taxi Data to Postgres (locally) and in BigQuery (GCP Cloud). All this was a part of Data Engineering Zoomcamp 2025. Furthermore, I have also explored building ELT pipelines with Kestra from Google Maps' Places API to Big Query, where I have used Python scripts to do the extraction.
This is a simple architecture which will be followed in this flow. I am also making a video which will be attached below for everyone who wants to work with Python and Big Query in Kestra.
Find the YT tutorial here:
https://www.youtube.com/watch?v=l5k9GxaUYYI&t=5s
An event-driven workflow orchestration tool! Check out Kestra Website
You can use docker to start your project with Kestra quickly. You can also find the same docker command from Kestra Documentation.
docker run --pull=always --rm -it -p 8080:8080 --user=root -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp kestra/kestra:latest server local
This is only to get you started quickly. To save your workflows in your Kestra instance, we need to use docker-compose. This particular command will download the docker-compose.yml file to setup Kestra and Postgres
curl -o docker-compose.yml \
https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.ymlRun docker compose up to start your project. Head to 'localhost:8080'. Follow this code to understand more about it and use it in your custom workflows.
But you can also leverage this docker-compose file below to have all the services running under one docker compose.
volumes:
postgres-data:
driver: local
kestra-data:
driver: local
zoomcamp-data:
driver: local
services:
postgres:
image: postgres
volumes:
- postgres-data:/var/lib/postgresql/data
environment:
POSTGRES_DB: kestra
POSTGRES_USER: kestra
POSTGRES_PASSWORD: k3str4
healthcheck:
test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
interval: 30s
timeout: 10s
retries: 10
kestra:
image: kestra/kestra:latest
pull_policy: always
# Note that this setup with a root user is intended for development purpose.
# Our base image runs without root, but the Docker Compose implementation needs root to access the Docker socket
# To run Kestra in a rootless mode in production, see: https://kestra.io/docs/installation/podman-compose
user: "root"
command: server standalone
volumes:
- kestra-data:/app/storage
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/kestra-wd:/tmp/kestra-wd
environment:
KESTRA_CONFIGURATION: |
datasources:
postgres:
url: jdbc:postgresql://postgres:5432/kestra
driverClassName: org.postgresql.Driver
username: kestra
password: k3str4
kestra:
server:
basicAuth:
enabled: false
username: "admin@kestra.io" # it must be a valid email address
password: kestra
repository:
type: postgres
storage:
type: local
local:
basePath: "/app/storage"
queue:
type: postgres
tasks:
tmpDir:
path: /tmp/kestra-wd/tmp
url: http://localhost:8080/
ports:
- "8080:8080"
- "8081:8081"
depends_on:
postgres:
condition: service_started
postgres_zoomcamp:
image: postgres
environment:
POSTGRES_USER: kestra
POSTGRES_PASSWORD: k3str4
POSTGRES_DB: postgres-zoomcamp
ports:
- "5432:5432"
volumes:
- zoomcamp-data:/var/lib/postgresql/data
depends_on:
kestra:
condition: service_started
pgadmin:
image: dpage/pgadmin4
environment:
- PGADMIN_DEFAULT_EMAIL=admin@admin.com
- PGADMIN_DEFAULT_PASSWORD=root
ports:
- "8085:80"
depends_on:
postgres_zoomcamp:
condition: service_started
You can import the flows into your Kestra space and run them for a few ETL tasks related to DE Zoomcamp 2025.
If you already have a Python script for your job, use this link to find out more about it here - https://www.youtube.com/watch?v=s4GjfRqlfmg