kubernetes-talos

Complete operational knowledge for deploying, managing, and operating production Kubernetes clusters using Talos Linux. An agentic stack that turns AI agents into Talos experts.

What This Is

This repository is a structured knowledge base — not code. It teaches AI agents (Claude, Gemini, GPT, etc.) how to operate Kubernetes clusters on Talos Linux across the full lifecycle: from initial infrastructure provisioning through production operations, upgrades, and disaster recovery.

When an agent reads the CLAUDE.md entry point, it gains:

Deep understanding of Talos's immutable, API-driven architecture
Step-by-step procedures with exact talosctl and kubectl commands
Decision frameworks for choosing CNI, storage, ingress, and other components
Safety guardrails that prevent destructive operations without approval
Symptom-based troubleshooting decision trees

Coverage

Area	What's Covered
Deployment targets	Bare metal, Proxmox, VMware, libvirt, AWS, GCP, Azure, Hetzner, DigitalOcean
CNI	Cilium (7 install methods), Flannel, Calico, Multus
Storage	Rook-Ceph, Longhorn, OpenEBS, Local Path, NFS, Cloud CSI (EBS/PD/Azure Disk)
Platform	Flux, ArgoCD, NGINX/Traefik/Cilium Gateway API, cert-manager, Prometheus, Loki, OpenTelemetry, Kyverno/OPA, Sealed Secrets/ESO/Vault, Cilium Mesh/Istio/Linkerd
Operations	Health checks, node scaling, Talos + K8s upgrades, etcd backup/DR, CA rotation
Topologies	Single-node dev, 3 CP + N workers, 5 CP + N workers, multi-site
Talos versions	1.8.x, 1.9.x with version-specific known issues

Quick Start

For AI Agent Users

Pull this stack into your project:

agentic-stacks init ./my-cluster --name my-cluster --namespace my-org --from kubernetes-talos

Or clone directly:

git clone https://github.com/agentic-stacks/kubernetes-talos.git .stacks/kubernetes-talos

Then point your agent to CLAUDE.md (or .stacks/kubernetes-talos/CLAUDE.md if using the stacks workflow). The agent will use the routing table to navigate to the right skill for any task.

For Humans

Browse the skills directly:

New to Talos? Start with skills/foundation/concepts
Building a cluster? Follow the new cluster workflow
Choosing components? Check the decision guides
Something broken? Jump to troubleshooting

Skills

Foundation

Skill	Description
`foundation/concepts`	Talos architecture, immutable OS model, API-driven operations
`foundation/machine-config`	Config generation, patching, secrets management, system extensions
`foundation/infrastructure`	Platform-specific provisioning guides for 9 platforms

Deploy

Skill	Description
`deploy/bootstrap`	Cluster creation, `talosctl bootstrap`, kubeconfig retrieval
`deploy/networking`	CNI selection, comparison, and installation (Cilium, Flannel, Calico, Multus)
`deploy/storage`	CSI selection, comparison, and installation (6 options)

Platform

Skill	Description
`platform/gitops`	Flux and ArgoCD bootstrap, repo structure patterns
`platform/ingress`	NGINX, Traefik, Cilium Gateway API, cert-manager
`platform/observability`	Prometheus, Loki, OpenTelemetry, Talos-native metrics
`platform/security`	Pod security, secrets management, RBAC, network policy
`platform/service-mesh`	Cilium mesh, Istio, Linkerd

Operations

Skill	Description
`operations/health-check`	Cluster validation procedures and health report format
`operations/scaling`	Adding and removing nodes, topology changes
`operations/upgrades`	Talos OS, Kubernetes, and component rolling upgrades
`operations/backup-restore`	etcd backup, Velero, disaster recovery procedures
`operations/certificate-mgmt`	Talos PKI, CA rotation, expiry monitoring

Diagnose

Skill	Description
`diagnose/troubleshooting`	Symptom-based decision trees for 8 common scenarios

Reference

Skill	Description
`reference/known-issues`	Version-specific bugs and workarounds
`reference/compatibility`	Talos/K8s/CNI/CSI compatibility matrices
`reference/decision-guides`	Trade-off analyses for CNI, CSI, topology, HA, GitOps

Workflows

New Cluster

foundation/concepts → foundation/machine-config → foundation/infrastructure
→ deploy/bootstrap → deploy/networking → deploy/storage
→ platform/* (as needed) → operations/health-check

Existing Cluster

Jump directly to the relevant operations/, diagnose/, or platform/ skill.

Required Tools

Tool	Purpose
`talosctl`	Talos API client (version must match target Talos version)
`kubectl`	Kubernetes CLI
`helm`	Helm package manager
`flux`	Flux CLI (optional, for GitOps with Flux)
`argocd`	ArgoCD CLI (optional, for GitOps with ArgoCD)

Project Structure

When using this stack, your operator project should look like:

my-cluster/
├── CLAUDE.md                # Points to .stacks/kubernetes-talos/
├── stacks.lock
├── .stacks/
│   └── kubernetes-talos/    # This stack
├── controlplane.yaml        # Generated machine config
├── controlplane.yaml.orig   # Stock config for diffing
├── worker.yaml
├── worker.yaml.orig
├── secrets.yaml             # Cluster secrets (keep secure)
├── patches/                 # Per-role and per-node patches
├── talosconfig              # Talos client config
├── kubeconfig               # K8s client config
├── manifests/               # Platform component manifests
└── scripts/                 # Operational scripts

Contributing

This stack follows the agentic-stacks format. Each skill is a directory under skills/ with a README.md entry point and optional sub-files for specific topics.

To add or update content:

Follow the existing writing style (imperative headings, exact commands, decision trees)
Verify commands against the official Talos docs
Add version-specific notes where behavior differs between Talos releases
Update stack.yaml if adding new skills

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kubernetes-talos

What This Is

Coverage

Quick Start

For AI Agent Users

For Humans

Skills

Foundation

Deploy

Platform

Operations

Diagnose

Reference

Workflows

New Cluster

Existing Cluster

Required Tools

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
skills		skills
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
stack.yaml		stack.yaml

Folders and files

Latest commit

History

Repository files navigation

kubernetes-talos

What This Is

Coverage

Quick Start

For AI Agent Users

For Humans

Skills

Foundation

Deploy

Platform

Operations

Diagnose

Reference

Workflows

New Cluster

Existing Cluster

Required Tools

Project Structure

Contributing

License

About

Topics

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages