Curated by Fourth Industrial Systems (4th.is), this guide highlights openāsource tools and patterns for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, and Analytics designed to run natively on Kubernetes and Docker. We work across languages ā Python, R, Scala, Java, C#, Go, Julia, C++ ā with practical emphasis on Kubeflow, Seldon Core, Pachyderm, Banzai Pipeline, H2O, TensorFlow, CNTK, XGBoost, MXNet, PyTorch, ONNX, Argo, Airflow, Apache Beam, Apache Spark, Intel BigDL, Rook, and Ambassador.
āThe wind and the waves are always on the side of the ablest navigator.ā ā Edmund Gibbon
Across industry, Kubernetes has become the standard for orchestrating distributed systems ā whether onāprem, in a single cloud, or spanning many. In contrast, many ML and data workflows still begin on laptops or adāhoc notebook servers. This repository shows how to elevate those experiments into reliable, scalable, reproducible, and portable Kubernetes deployments.
- Elastic scale for CPU/GPU resources with automated orchestration.
- Portability: the same workloads run across all major clouds and onāprem.
- Ecosystem momentum: see wide adoption in the CNCF membership.
- Selfāhealing: immutable containers plus controllers enable resilient apps.
- Containerization: package apps as small, efficient images for rapid scaleāout.
- Immutability: predictable rollouts, easy rollbacks, and reproducibility.
- Persistent storage: plan PVCs/CSI drivers early; projects like Rook are popular.
- Service connectivity: ephemeral pods change the way services talk; a service mesh helps (see http://layer5.io/service-meshes/).
We focus on AI/ML/Data Science OSS that thrives in infinitely scalable Kubernetes environments. For a broader view of orchestration and operations, see Awesome Machine Learning Operations.
You may also want domaināspecific āawesomeā lists:
- awesome-kubernetes
- Awesome Helm
- Awesome Operators
- Awesome Docker
- container-security-awesome
- Awesome Linux Containers
- awesome-AIOps
- Awesome Julia
- Awesome R
- Awesome Bioinformatics
- Awesome Recurrent Neural Networks
- Awesome Reinforcement Learning
- Awesome Artificial Intelligence
- Awesome Machine Learning
- Awesome StarCraft AI
- Awesome Quantum Machine Learning
- Awesome AI
- Awesome Feature Engineering
- Awesome WindowsML ONNX Models
- Awesome-ONNX-Models
- Awesome TensorFlow
- Awesome Blockchain AI
- Awesome Deep Learning
- Awesome Deep Learning Resources
- awesome-nlp
- Awesome-Pytorch-list
- awesome-ai-services
- awesome_list_ai_bot_programming
- ML & DL Tutorials
- ML on Source Code
- ML Interpretability
- Interpretable ML
- AutoML Papers
- Awesome H2O
- Awesome MXNet
- Awesome Bots
- Awesome ChatOps
- Awesome Apache Airflow
- Awesome Big Data
- Awesome-BigData
- awesome-datasets
- Awesome Analytics
- Awesome Data Science
- Awesome Pipeline
- Awesome Nextflow
- awesome-etl
- Awesome Business Intelligence
Openāsource projects are maintained by people and teams of all sizes. If you find value here, please star upstream repos, file issues/PRs, and thank maintainers. If somethingās missing, open a discussion and weāll add it.
Kubernetes translates roughly to āhelmsmanā and draws design lineage from Googleās Borg. The internal codename Project Seven nods to Seven of Nine from Star Trek; the logoās seven spokes reference that origin. More background: https://en.wikipedia.org/wiki/Kubernetes.
āThe duties of the ruler are like those of the helmsman of a great shipā¦ā ā Han Fei
āIf you want to build a ship⦠teach them to yearn for the vast and endless sea.ā ā Antoine de SaintāExupĆ©ry
http://kubeflow.org/ ā Cloudānative ML platform.
-
Training: TFJob controller for CPU/GPU scaling (tf-operator).
-
Serving: TensorFlow Serving and integrations with Seldon Core.
-
Multiāframework: operators for PyTorch, MXNet, Chainer, with ingress via Ambassador and pipelines with Pachyderm.
-
Extras: Kubeflow Labs (Azure), H2O + Kubeflow.
https://www.seldon.io/ ā Kubernetesānative model serving: https://github.com/SeldonIO/seldon-core.
http://pachyderm.io/ ā Versioned data pipelines for production ML: https://github.com/pachyderm/pachyderm.
Multiāframework deep learning on Kubernetes (TensorFlow, Caffe, PyTorch).
Docs: https://developer.ibm.com/patterns/deploy-and-use-a-multi-framework-deep-learning-platform-on-kubernetes/
Code: https://github.com/IBM/FfDL
Platform for building, training, and monitoring largeāscale DL apps.
https://polyaxon.com/ ⢠https://github.com/polyaxon/polyaxon
Big Data Science on Kubernetes.
https://github.com/datalayer/datalayer ⢠https://datalayer.io ⢠https://docs.datalayer.io
Accelerators for building ML containers and K8s objects.
https://github.com/IntelAI/mlt
āImpossible is a word humans use far too often.ā ā Seven of Nine
Realātime enterprise AI platform with K8s quickstart:
https://github.com/PipelineAI/pipeline ⢠https://pipeline.ai
- Dask scales Python for analytics: https://github.com/dask/dask ⢠http://dask.pydata.org/en/latest/
Examples: https://github.com/dask/dask-examples ⢠Tutorial: https://github.com/dask/dask-tutorial - DaskāKubernetes: https://github.com/dask/dask-kubernetes ⢠Docs: https://dask-kubernetes.readthedocs.io/en/latest/
Helm: https://github.com/dask/helm-chart ⢠Docker: https://github.com/dask/dask-docker - DaskāML: https://github.com/dask/dask-ml ⢠http://ml.dask.org/
- DaskāXGBoost: https://github.com/dask/dask-xgboost
https://github.com/Landoop/kafka-helm-charts ⢠Connectors: https://github.com/Landoop/stream-reactor
Endātoāend sample stack (K8s, Spark/Flink/Beam, Kafka, etc.):
https://github.com/Chabane/bigdata-playground
From commit to scale on Kubernetes (CI/CD, logging, monitoring, autoscaling):
https://github.com/banzaicloud/pipeline
Containerānative workflows; cloudāagnostic; runs on any Kubernetes cluster:
https://argoproj.github.io/ ⢠https://github.com/argoproj/argo
Events: https://github.com/argoproj/argo-events
Author, schedule, and monitor DAGs for ETL/ML: https://airflow.apache.org/
Best practices: https://gtoonstra.github.io/etl-with-airflow/
K8s tools: https://github.com/mumoshu/kube-airflow ⢠Operator: https://github.com/GoogleCloudPlatform/airflow-operator
- Beam Operator: https://github.com/aleksdjuricin/beam-operator
- Cronāscheduled Beam Jobs: https://github.com/sanderploegsma/beam-scheduling-kubernetes
- Google Cloud Dataflow Templates: https://github.com/GoogleCloudPlatform/DataflowTemplates
Cloudānative storage orchestration: https://rook.io/ ⢠https://github.com/rook/rook
Containerāattached block storage (Go), with SLAs, tiering, and multiāAZ replica policies:
https://www.openebs.io/ ⢠https://github.com/openebs/openebs
Maya orchestration: https://github.com/openebs/maya ⢠Helm: https://github.com/openebs/charts
āOnly those who brave its dangers comprehend its mystery.ā ā Longfellow
T.S. Eliot, The Waste Land (for perspective).
Note: Native K8s support arrived in Spark 2.3 and has matured since, but always check your target versionās capabilities.
- Spark Operator: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
- Spark on PKS (multiācloud): https://github.com/SnappyDataInc/spark-on-k8s
- Sparknetes: https://github.com/hypnosapos/sparknetes
- HDFS on K8s (Helm charts): https://github.com/apache-spark-on-k8s/kubernetes-HDFS
- Stable Helm chart (Spark): https://github.com/helm/charts/tree/master/stable/spark
- Helm chart (Spark Operator): https://github.com/helm/charts/tree/master/incubator/sparkoperator
- Kubernetes examples (Spark): https://github.com/kubernetes/examples/tree/master/staging/spark (may be out of date)
- BigDL: https://bigdl-project.github.io/ ⢠https://github.com/intel-analytics/BigDL
- Analytics Zoo: https://analytics-zoo.github.io/ ⢠https://github.com/intel-analytics/analytics-zoo
- Rad Analytics Spark Operator: https://github.com/radanalyticsio/spark-operator
- OpenShift Spark Images: https://github.com/radanalyticsio/openshift-spark
- SparkPi (Vert.x) tutorial: https://github.com/radanalyticsio/tutorial-sparkpi-java-vertx
- EFK (Elastic/Fluentd/Kibana) Helm: https://github.com/cdwv/efk-stack-helm
- Draft (Azure): https://draft.sh/ ⢠https://github.com/Azure/draft
Pack repo plugin: https://github.com/draftcreate/draft-pack-repo - Brigade (eventādriven pipelines): https://brigade.sh/ ⢠https://github.com/Azure/brigade
- Dashboard: https://github.com/Azure/kashti
- Terminal UI: https://github.com/slok/brigadeterm
- Prometheus exporter: https://github.com/slok/brigade-exporter
- Gateways: BitBucket, GitLab, K8s events, Event Grid, Cron, Trello
- Build your own gateway: https://github.com/technosophos/draft-brigade
- ksonnet (historical): https://ksonnet.io/
The podderāai ecosystem offers related components:
- Kubeb (CLI for building/deploying to K8s): https://github.com/podder-ai/kubeb
- pipelineāframework (Airflowābased scheduling/monitoring): https://github.com/podder-ai/pipeline-framework
- pipelineāgenerator and sample repos:
https://github.com/podder-ai/pipeline-generator ā¢
https://github.com/podder-ai/pipeline-framework-sample ā¢
https://github.com/podder-ai/poc-base-sample ā¢
https://github.com/podder-ai/poc-base
Fourth Industrial Systems builds scalable, ethical AI solutions and agentic workflows that move seamlessly from prototype to production on Kubernetes.
Contact: freeman@4th.is ⢠Learn: learn.4th.is ⢠News: news.4th.is
Trademarks: KubernetesĀ®, ApacheĀ®, NVIDIAĀ®, and other names are the property of their respective owners; references are for identification only and imply no endorsement.