Skip to content

🐳 4th — Cloud-Native AI/ML on Kubernetes. A curated guide from Fourth Industrial Systems (4th.is) to open-source frameworks and workflows for AI, ML, DL, CV, and data science on K8s & Docker—featuring Kubeflow, Seldon Core, Pachyderm, Banzai, H2O, TensorFlow, PyTorch, MXNet, XGBoost, ONNX, and Spark.

License

Notifications You must be signed in to change notification settings

4th/cloudnative-ai-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ā 

History

4 Commits
Ā 
Ā 
Ā 
Ā 

Repository files navigation

ā„ļøšŸ³ 4th — Cloud‑Native AI/ML on Kubernetes

Curated by Fourth Industrial Systems (4th.is), this guide highlights open‑source tools and patterns for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, and Analytics designed to run natively on Kubernetes and Docker. We work across languages — Python, R, Scala, Java, C#, Go, Julia, C++ — with practical emphasis on Kubeflow, Seldon Core, Pachyderm, Banzai Pipeline, H2O, TensorFlow, CNTK, XGBoost, MXNet, PyTorch, ONNX, Argo, Airflow, Apache Beam, Apache Spark, Intel BigDL, Rook, and Ambassador.

ā€œThe wind and the waves are always on the side of the ablest navigator.ā€ — Edmund Gibbon


Introduction

Across industry, Kubernetes has become the standard for orchestrating distributed systems — whether on‑prem, in a single cloud, or spanning many. In contrast, many ML and data workflows still begin on laptops or ad‑hoc notebook servers. This repository shows how to elevate those experiments into reliable, scalable, reproducible, and portable Kubernetes deployments.

Why Kubernetes for ML

  • Elastic scale for CPU/GPU resources with automated orchestration.
  • Portability: the same workloads run across all major clouds and on‑prem.
  • Ecosystem momentum: see wide adoption in the CNCF membership.
  • Self‑healing: immutable containers plus controllers enable resilient apps.

Practical Considerations

  • Containerization: package apps as small, efficient images for rapid scale‑out.
  • Immutability: predictable rollouts, easy rollbacks, and reproducibility.
  • Persistent storage: plan PVCs/CSI drivers early; projects like Rook are popular.
  • Service connectivity: ephemeral pods change the way services talk; a service mesh helps (see http://layer5.io/service-meshes/).

We focus on AI/ML/Data Science OSS that thrives in infinitely scalable Kubernetes environments. For a broader view of orchestration and operations, see Awesome Machine Learning Operations.

You may also want domain‑specific ā€œawesomeā€ lists:

Kubernetes

Spark

AI/ML

Other Data/ETL/Analytics


Community & Attribution

Open‑source projects are maintained by people and teams of all sizes. If you find value here, please star upstream repos, file issues/PRs, and thank maintainers. If something’s missing, open a discussion and we’ll add it.


Kubernetes — Name & Heritage

Kubernetes translates roughly to ā€œhelmsmanā€ and draws design lineage from Google’s Borg. The internal codename Project Seven nods to Seven of Nine from Star Trek; the logo’s seven spokes reference that origin. More background: https://en.wikipedia.org/wiki/Kubernetes.

ā€œThe duties of the ruler are like those of the helmsman of a great shipā€¦ā€ — Han Fei


ML Built for Kubernetes (Native Kube)

ā€œIf you want to build a ship… teach them to yearn for the vast and endless sea.ā€ — Antoine de Saint‑ExupĆ©ry

Kubeflow

http://kubeflow.org/ — Cloud‑native ML platform.

Seldon Core

https://www.seldon.io/ — Kubernetes‑native model serving: https://github.com/SeldonIO/seldon-core.

Pachyderm

http://pachyderm.io/ — Versioned data pipelines for production ML: https://github.com/pachyderm/pachyderm.

Fabric for Deep Learning (FfDL)

Multi‑framework deep learning on Kubernetes (TensorFlow, Caffe, PyTorch).
Docs: https://developer.ibm.com/patterns/deploy-and-use-a-multi-framework-deep-learning-platform-on-kubernetes/
Code: https://github.com/IBM/FfDL

Polyaxon

Platform for building, training, and monitoring large‑scale DL apps.
https://polyaxon.com/ • https://github.com/polyaxon/polyaxon

Datalayer

Big Data Science on Kubernetes.
https://github.com/datalayer/datalayer • https://datalayer.io • https://docs.datalayer.io

IntelAI Machine Learning Container Templates

Accelerators for building ML containers and K8s objects.
https://github.com/IntelAI/mlt


ML Adapted to Kubernetes

ā€œImpossible is a word humans use far too often.ā€ — Seven of Nine

Pipeline.AI

Real‑time enterprise AI platform with K8s quickstart:
https://github.com/PipelineAI/pipeline • https://pipeline.ai

Dask & Friends

Kafka on K8s (Helm)

https://github.com/Landoop/kafka-helm-charts • Connectors: https://github.com/Landoop/stream-reactor

Big Data Playground

End‑to‑end sample stack (K8s, Spark/Flink/Beam, Kafka, etc.):
https://github.com/Chabane/bigdata-playground


Pipeline & Data Flow

Banzai Pipeline

From commit to scale on Kubernetes (CI/CD, logging, monitoring, autoscaling):
https://github.com/banzaicloud/pipeline

Argo

Container‑native workflows; cloud‑agnostic; runs on any Kubernetes cluster:
https://argoproj.github.io/ • https://github.com/argoproj/argo
Events: https://github.com/argoproj/argo-events

Apache Airflow

Author, schedule, and monitor DAGs for ETL/ML: https://airflow.apache.org/
Best practices: https://gtoonstra.github.io/etl-with-airflow/
K8s tools: https://github.com/mumoshu/kube-airflow • Operator: https://github.com/GoogleCloudPlatform/airflow-operator

Apache Beam / Dataflow


Storage for Kubernetes

Rook

Cloud‑native storage orchestration: https://rook.io/ • https://github.com/rook/rook

OpenEBS

Container‑attached block storage (Go), with SLAs, tiering, and multi‑AZ replica policies:
https://www.openebs.io/ • https://github.com/openebs/openebs
Maya orchestration: https://github.com/openebs/maya • Helm: https://github.com/openebs/charts


Spark at Sea (on K8s)

ā€œOnly those who brave its dangers comprehend its mystery.ā€ — Longfellow
T.S. Eliot, The Waste Land (for perspective).

Note: Native K8s support arrived in Spark 2.3 and has matured since, but always check your target version’s capabilities.

Intel BigDL & Analytics Zoo


Spark on OKD / OpenShift


Utilities & Accessories


Odds & Ends

The podder‑ai ecosystem offers related components:


About Fourth Industrial Systems

Fourth Industrial Systems builds scalable, ethical AI solutions and agentic workflows that move seamlessly from prototype to production on Kubernetes.
Contact: freeman@4th.is • Learn: learn.4th.is • News: news.4th.is

Trademarks: KubernetesĀ®, ApacheĀ®, NVIDIAĀ®, and other names are the property of their respective owners; references are for identification only and imply no endorsement.

About

🐳 4th — Cloud-Native AI/ML on Kubernetes. A curated guide from Fourth Industrial Systems (4th.is) to open-source frameworks and workflows for AI, ML, DL, CV, and data science on K8s & Docker—featuring Kubeflow, Seldon Core, Pachyderm, Banzai, H2O, TensorFlow, PyTorch, MXNet, XGBoost, ONNX, and Spark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published