kubernetes-ontology

_{Real topology viewer capture: a kind-style Helm workload graph with Service, config, identity/RBAC, PVC, PV, StorageClass, CSI driver, provisioner, Node, and Event evidence.}

kubernetes-ontology is a read-only Kubernetes topology service for diagnostics, graph exploration, and AI-agent workflows.

It builds an in-memory ontology graph from Kubernetes objects, keeps the graph fresh with informers or polling, and exposes stable CLI and HTTP queries for entities, relations, neighbors, and diagnostic subgraphs.

The open-source MVP is intentionally lightweight:

no controller or mutating webhook for the workloads being observed
no runtime writes to observed Kubernetes resources
no persistent database requirement
no external graph backend requirement
no CRD installation requirement

For the standard server and client workflow, start with QUICKSTART.md.

Why This Exists

Kubernetes troubleshooting usually starts with scattered object reads: kubectl get pod, then owner references, then services, events, PVCs, RBAC, webhooks, CSI drivers, and controller pods.

This project turns those object reads into a graph:

pods, workloads, services, nodes, storage, RBAC, events, images, webhooks, Helm releases, and Helm charts become typed entities
Kubernetes references and inferred dependencies become typed relations
diagnostic queries return a focused subgraph instead of a flat object dump
AI agents can ask stable read-only questions without crawling the cluster from scratch every time

Current Capabilities

Diagnostic Entrypoints

Pod
Workload
PVC
PV
StorageClass
CSIDriver
HelmRelease
HelmChart

Runtime

full bootstrap snapshot from the Kubernetes API
long-running daemon with runtime status
informer-first continuous refresh with polling fallback
bounded CLI observe mode
category-aware change planning
scoped graph mutation for common update categories

Current narrow strategies:

service-narrow
event-narrow
storage-narrow
identity/security-narrow
pod-narrow
workload-narrow

Unsupported categories fall back to a full rebuild.

Graph Recovery

The graph can recover and correlate:

recursive owner chains, including Pod -> ReplicaSet -> Deployment
custom workload resources configured from CRDs, such as Kruise ASTS or Redis clusters
display-only controller ownership rules for controller pods that Kubernetes does not expose through owner references
service selector matches
pod to node placement
pod to Secret, ConfigMap, ServiceAccount, image, PVC, PV, StorageClass, and CSI driver paths
ServiceAccount to RoleBinding and ClusterRoleBinding evidence
Kubernetes Event and admission webhook evidence
PV CSI metadata
Helm release and chart provenance from standard Helm labels and annotations

CSI Correlation

CSI storage topology follows PVC -> PV/StorageClass -> CSIDriver. Component correlation is configured with csiComponentRules; driver-specific controller and node-agent inference is not enabled unless a matching rule is configured.

Recovered evidence can include relations such as:

provisioned_by_csi_driver
implemented_by_csi_controller
implemented_by_csi_node_agent
managed_by_csi_controller
served_by_csi_node_agent

Helm Provenance

Resources labeled with standard Helm metadata produce HelmRelease and HelmChart nodes. The graph adds managed_by_helm_release and installs_chart edges with label_evidence provenance and confidence scores. These are ownership hints from labels, not exact manifest membership.

Helm Upgrade Failure Triage

The Incident Context Pack recipe flags in this section are available in v0.1.6 and newer release archives.

When a user only says "helm upgrade failed" and does not have the Helm CLI output, kubernetes-ontology can still diagnose the current cluster state for that release:

kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --diagnose-helm-release \
  --namespace default \
  --name my-release

The response expands the probable release-owned resources and chart evidence. It also marks the missing Helm-side evidence explicitly:

helm_cli_output_not_observed: template, values, repository, client, hook, and --atomic rollback errors are outside current Kubernetes object state.
helm_manifest_evidence_not_collected: default Helm ownership is label and annotation evidence, not exact release manifest membership.

For rollout failures that reached the cluster, follow the release graph into the affected Workload or Pod diagnostic. For render/client failures, ask the user to paste the helm upgrade stderr or helm status/history output.

Incident Context Pack v1 adds an optional recipe label for this workflow:

kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --entry-kind Pod \
  --namespace default \
  --name bad-pod \
  --recipe helm-upgrade-runtime-failure

The checked-in sample at samples/helm-upgrade-failure/ can be opened in the viewer without a live cluster and demonstrates ranked evidence, freshness, budget metadata, Helm caveats, and clickable evidence references.

Agent Onboarding

This repository provides a Codex-style skill: skills/kubernetes-ontology-access. Install it directly from GitHub when you want an AI agent to guide the whole onboarding flow instead of reading the docs manually. Users do not need to clone this repository before installing the skill.

npx skills add https://github.com/Colvin-Y/kubernetes-ontology/tree/main/skills/kubernetes-ontology-access -g --agent codex

You can also install from the repository root and select the skill by name:

npx skills add Colvin-Y/kubernetes-ontology -s kubernetes-ontology-access -g --agent codex

Skill marketplace links intentionally point at the default branch so agents get the latest onboarding instructions. Use tagged releases for runtime binaries, container images, and Helm chart versions.

Restart Codex after installing the skill, then ask for a guided setup, for example:

Use the kubernetes-ontology-access skill to onboard my cluster with Helm,
install the CLI, run a Pod diagnostic query, and open the viewer path.

The skill connects the three intended access modes:

AI-agent automatic troubleshooting with daemon-backed diagnostic subgraphs.
CLI queries for status, entity resolution, relations, neighbors, expansion, and Pod/Workload diagnosis.
Human visual inspection through the topology viewer and exported graph JSON.

Agent implementers should also read AI_CONTRACT.md for the diagnostic subgraph contract and safe downstream reasoning rules.

Safety Model

kubernetes-ontology is read-only with respect to the Kubernetes resources it observes.

At runtime, the daemon does not:

create, patch, update, or delete observed Kubernetes resources
write annotations or status fields
install CRDs or controllers for observed workloads
mutate RBAC policy in the observed cluster

There are three deployment modes:

Source/local mode uses your kubeconfig and performs read-only Kubernetes API calls.
Release binary mode uses the published archive to run kubernetes-ontologyd on your workstation or a bastion host. It creates no Kubernetes resources and only needs network access from that host to the Kubernetes API server.
Helm mode installs this project's own Deployment, Service, ServiceAccount, ConfigMap, and read-only RBAC so the daemon and viewer can run in-cluster. That install-time footprint is expected. The granted RBAC is limited to get, list, and watch for collected resources. Secret reads are enabled by default so Secret nodes and uses_secret edges can be collected; set rbac.readSecrets=false to disable them.

The HTTP API is intended for local or controlled environments, not public multi-tenant exposure.

Installation

Option 1: Release Binary Server + Client

Use this path when the target cluster is private, air-gapped, or cannot pull the published GHCR image. The release archive includes the server kubernetes-ontologyd, the CLI client kubernetes-ontology, and the optional viewer kubernetes-ontology-viewer.

export KO_VERSION=v0.1.6
curl -LO "https://github.com/Colvin-Y/kubernetes-ontology/releases/download/${KO_VERSION}/kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz"
tar -xzf "kubernetes-ontology_${KO_VERSION}_linux_amd64.tar.gz"
cd "kubernetes-ontology_${KO_VERSION}_linux_amd64"

Create kubernetes-ontology.yaml with a kubeconfig path and collection scope:

kubeconfig: /absolute/path/to/kubeconfig.yaml
cluster: your-logical-cluster
contextNamespaces:
  - default
  - kube-system
server:
  addr: 127.0.0.1:18080
bootstrapTimeout: 2m
streamMode: informer

Start the server:

./kubernetes-ontologyd --config ./kubernetes-ontology.yaml

Query it from another terminal:

./kubernetes-ontology --server "http://127.0.0.1:18080" --status

This mode starts only host-local processes. Stop foreground server or viewer processes with Ctrl-C; if you background them, store the PID and kill it when the diagnostic session ends.

Option 2: Helm + Release CLI

Use this path when you want to run the server in Kubernetes without compiling from source and cluster nodes can pull the configured image. For private clusters, mirror ghcr.io/colvin-y/kubernetes-ontology to an internal registry and set KO_IMAGE to that mirror, or use the release binary path above.

export KO_VERSION=v0.1.6
export KO_IMAGE=ghcr.io/colvin-y/kubernetes-ontology

helm upgrade --install kubernetes-ontology ./charts/kubernetes-ontology \
  --namespace kubernetes-ontology \
  --create-namespace \
  --set image.repository="${KO_IMAGE}" \
  --set image.tag="${KO_VERSION}" \
  --set cluster="your-logical-cluster" \
  --set contextNamespaces='{default,kube-system}'

Expose the server locally:

kubectl -n kubernetes-ontology port-forward svc/kubernetes-ontology 18080:18080

Download the kubernetes-ontology CLI from GitHub Releases, or set KO_VERSION to the release tag you want to install, then query the server:

kubernetes-ontology --server "http://127.0.0.1:18080" --status

The Helm chart creates the project Deployment, Service, ServiceAccount, ConfigMap, and read-only RBAC required to run in-cluster. It also deploys the topology viewer by default:

kubectl -n kubernetes-ontology port-forward svc/kubernetes-ontology-viewer 8765:8765

Open http://127.0.0.1:8765.

Stop short-lived kubectl port-forward processes with Ctrl-C. Remove the in-cluster footprint with:

helm uninstall kubernetes-ontology --namespace kubernetes-ontology

Option 3: Run From Source

Use this path for local development or when you want to run the daemon from your workstation.

make build
cp local/kubernetes-ontology.yaml.example local/kubernetes-ontology.yaml

Edit local/kubernetes-ontology.yaml, then start the daemon:

make serve

In another terminal:

make status-server
make list-entities-server ENTITY_KIND=Pod NAMESPACE=default LIMIT=20

See QUICKSTART.md for the full walkthrough.

Configuration

YAML config is the recommended way to keep cluster-specific settings:

kubeconfig: /absolute/path/to/kubeconfig.yaml
cluster: your-logical-cluster
namespace: default
contextNamespaces:
  - default
  - kube-system

server:
  addr: 127.0.0.1:18080
  url: http://127.0.0.1:18080
bootstrapTimeout: 2m
streamMode: informer
pollInterval: 5s

Custom workload resources and display-only controller rules are optional:

workloadResources:
  - group: apps.kruise.io
    version: v1beta1
    resource: statefulsets
    kind: StatefulSet
    namespaced: true

controllerRules:
  - apiVersion: apps.kruise.io/*
    kind: "*"
    namespace: kruise-system
    controllerPodPrefixes:
      - kruise-controller-manager
    nodeDaemonPodPrefixes:
      - kruise-daemon

csiComponentRules:
  - driver: diskplugin.csi.alibabacloud.com
    namespace: kube-system
    controllerPodPrefixes:
      - csi-provisioner-
    nodeAgentPodPrefixes:
      - csi-plugin-

If a configured custom resource is not installed in the cluster, the daemon logs the missing resource and skips that informer. This is expected on a clean kind cluster that does not have OpenKruise, Redis operators, or similar CRDs installed.

More detail: local/README.md.

CLI Examples

Query daemon status:

./bin/kubernetes-ontology --server "http://127.0.0.1:18080" --status

Resolve a pod entity:

./bin/kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --resolve-entity \
  --entity-kind Pod \
  --namespace default \
  --name my-pod

Diagnose a pod:

./bin/kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --diagnose-pod \
  --namespace default \
  --name my-pod \
  --max-nodes 200 \
  --max-edges 400

Diagnose a Helm release after a failed upgrade:

Requires v0.1.6 or newer.

./bin/kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --diagnose-helm-release \
  --namespace default \
  --name my-release

Diagnostic responses include additive schemaVersion, recipe, lanes, partial, warnings, budgets, rankedEvidence, degradedSources, and conflicts fields. Agents should use those fields to distinguish bounded evidence from complete cluster truth.

Expand one graph node:

./bin/kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --expand-entity \
  --entity-id 'your/entityGlobalId' \
  --expand-depth 1 \
  --limit 100

List filtered relations:

./bin/kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --list-filtered-relations \
  --from 'your/entityGlobalId' \
  --relation-kind scheduled_on \
  --limit 50

For machine-readable server query failures:

./bin/kubernetes-ontology \
  --server "http://127.0.0.1:18080" \
  --machine-errors \
  --resolve-entity \
  --entity-kind Pod \
  --namespace default \
  --name missing-pod

HTTP API

The daemon exposes the current in-memory ontology database over HTTP:

GET /healthz
GET /status
GET /entity?entityGlobalId=...
GET /entity?kind=Pod&namespace=default&name=my-pod
GET /entities?kind=Pod&namespace=default&limit=50
GET /relations?from=...&kind=scheduled_on
GET /neighbors?entityGlobalId=...&direction=out
GET /expand?entityGlobalId=...&depth=1
GET /diagnostic?kind=Pod&namespace=default&name=my-pod&recipe=pod-incident
GET /diagnostic/pod?namespace=default&name=my-pod&maxNodes=200&maxEdges=400
GET /diagnostic/workload?namespace=default&name=my-deployment

Graph and list responses include additive freshness metadata when daemon runtime status is available. Error responses include code, message, status, retryable, and source alongside the historical error string. Diagnostic responses additionally include explicit partial/budget metadata and ranked evidence for downstream agents.

Visualization

The repository includes a local topology viewer:

kubernetes-ontology-viewer, a release binary with embedded static assets
tools/visualize/server.py, a development server
tools/visualize/index.html, the browser UI

Start the daemon first:

make serve

Start the viewer:

make visualize

Open http://127.0.0.1:8765.

The viewer can load live topology, query focused diagnostic graphs, expand and collapse nodes, filter by node or relation metadata, inspect provenance, and export the visible subgraph as JSON. Focused diagnostic graphs show a Diagnostic Signals panel with budget truncation, warnings, conflicts, degraded sources, and ranked evidence before lower-level explanation text.

Architecture

Core layers:

internal/collect/k8s: read-only Kubernetes collection, informers, and polling fallback
internal/runtime: bootstrap, lifecycle, status, and stream application
internal/ontology: entity and relation storage abstraction
internal/server: HTTP API for status, ontology queries, and diagnostics
internal/reconcile: full rebuild and scoped mutation reconcilers
internal/graph: graph builder, kernel, and index
internal/query: query facade
internal/service/diagnostic: diagnostic subgraph query implementation
tools/visualize: local graph viewer

Owner-chain recovery prefers controller owner references, resolves by UID first, falls back to namespace/kind/name, guards against cycles, and supports deeper chains beyond Pod -> ReplicaSet -> Deployment.

Development

Build:

make build

Run tests:

make test

make test runs:

go test -p 1 ./...

After code changes that touch the daemon or viewer, use the fixed local verification flow:

make verify
make serve
make visualize
make live-check NAMESPACE=default NAME=my-pod

Release Publishing

Tagged releases publish:

per-platform archives containing kubernetes-ontology, kubernetes-ontologyd, kubernetes-ontology-viewer, Quickstart docs, release notes, and a local config example
a packaged Helm chart archive, for example kubernetes-ontology-0.1.6.tgz
a multi-architecture image at ghcr.io/colvin-y/kubernetes-ontology:<tag>
SemVer aliases without the leading v, plus latest

See docs/release.md for the release checklist. See CHANGELOG.md for release notes.

The agent skill is published from the default branch rather than from release archives, so marketplace pages should link to the live repository path: skills/kubernetes-ontology-access.

Known Limitations

Graph state is in memory only.
HTTP auth and TLS are not implemented yet.
Persistent graph backends and external graph adapters are outside the open-source MVP.
RBAC topology is represented for ServiceAccount subjects and binding objects; it is not a full permission reasoning engine.
Evidence ranking currently starts with returned Event evidence and will grow into richer signal ranking over time.
Runtime RDF/OWL materialization is not implemented.

Roadmap

Extend informer and scoped-reconcile coverage for more topology categories.
Add HTTP auth/TLS and longer daemon soak tests.
Improve diagnostic evidence ranking for downstream AI agents.
Broaden RBAC interpretation without turning the MVP into a full authorization engine.
Keep persistent stores and external graph adapters as post-MVP research.

Documentation

QUICKSTART.md: full setup and query walkthrough
README.zh-CN.md: Chinese overview and usage notes
AI_CONTRACT.md: contract for AI-agent consumers
skills/kubernetes-ontology-access: project-local skill for Helm, CLI, AI-agent, and viewer onboarding
docs/design/README.md: design document index
docs/ontology/README.md: ontology notes
docs/release.md: release checklist
CONTRIBUTING.md: contribution workflow and validation
SECURITY.md: supported versions, reporting, and safety boundaries
CHANGELOG.md: release notes

License

Licensed under the Apache License, Version 2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kubernetes-ontology

Why This Exists

Current Capabilities

Diagnostic Entrypoints

Runtime

Graph Recovery

CSI Correlation

Helm Provenance

Helm Upgrade Failure Triage

Agent Onboarding

Safety Model

Installation

Option 1: Release Binary Server + Client

Option 2: Helm + Release CLI

Option 3: Run From Source

Configuration

CLI Examples

HTTP API

Visualization

Architecture

Development

Release Publishing

Known Limitations

Roadmap

Documentation

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
charts/kubernetes-ontology		charts/kubernetes-ontology
cmd		cmd
docs		docs
internal		internal
local		local
samples		samples
schemas		schemas
scripts/ci		scripts/ci
skills		skills
tools/visualize		tools/visualize
.dockerignore		.dockerignore
.gitignore		.gitignore
.vercelignore		.vercelignore
AI_CONTRACT.md		AI_CONTRACT.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
TODOS.md		TODOS.md
go.mod		go.mod
go.sum		go.sum
index.html		index.html
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

kubernetes-ontology

Why This Exists

Current Capabilities

Diagnostic Entrypoints

Runtime

Graph Recovery

CSI Correlation

Helm Provenance

Helm Upgrade Failure Triage

Agent Onboarding

Safety Model

Installation

Option 1: Release Binary Server + Client

Option 2: Helm + Release CLI

Option 3: Run From Source

Configuration

CLI Examples

HTTP API

Visualization

Architecture

Development

Release Publishing

Known Limitations

Roadmap

Documentation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages