Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/docs-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: CI

on:
pull_request:
branches: [main]

permissions:
contents: read

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-node@v6
with:
node-version: "20"
cache: npm

- run: npm ci

- run: npm run build
4 changes: 2 additions & 2 deletions docs/advanced/parallel-fan-out.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ A lower-level tool for advanced patterns where the agent needs fine-grained cont

## How it works with autoscaling

Fan-out naturally increases the pending task count on the agent's queue. When [KEDA-based autoscaling](/integrations/autoscaling) is enabled, this triggers pod scale-up:
Fan-out naturally increases the pending task count on the agent's queue. When [KEDA-based autoscaling](/scaling/autoscaling) is enabled, this triggers pod scale-up:

1. Agent submits 10 subtasks via `spawn_and_collect`
2. 10 pending messages appear on the Redis Stream
Expand All @@ -100,7 +100,7 @@ No changes to the autoscaling configuration are needed. The existing pending-tas

## How it works with budgets

Each subtask is a separate task execution that consumes tokens from the agent's [SwarmBudget](/advanced/budget-management). The originating agent's own token usage for the fan-out/collect cycle is minimal (tool call overhead only). The real cost is in the subtask executions, which are tracked individually.
Each subtask is a separate task execution that consumes tokens from the agent's [SwarmBudget](/scaling/budget-management). The originating agent's own token usage for the fan-out/collect cycle is minimal (tool call overhead only). The real cost is in the subtask executions, which are tracked individually.

## Example

Expand Down
2 changes: 1 addition & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ No. kubeswarm uses native Kubernetes Secrets only. No wrapper CRD.

## How does budget enforcement work?

The operator tracks rolling 24h token usage per agent. When `spec.guardrails.limits.dailyTokens` is exceeded, replicas are scaled to 0 and a `BudgetExceeded` condition is set. Replicas restore automatically when the window rotates. See [Budget Management](/advanced/budget-management).
The operator tracks rolling 24h token usage per agent. When `spec.guardrails.limits.dailyTokens` is exceeded, replicas are scaled to 0 and a `BudgetExceeded` condition is set. Replicas restore automatically when the window rotates. See [Budget Management](/scaling/budget-management).

## Can agents call other agents?

Expand Down
6 changes: 3 additions & 3 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ swarm audit tree evt-abc123
Audit events emit to a configurable sink (stdout, Redis Stream, or webhook). Opt-in at cluster, namespace, or agent level.

- [Audit Trail](/observability/audit-trail) - full configuration guide, event schema, and CLI reference
- [Budget Management](/advanced/budget-management) - per-action token tracking and cost attribution
- [Budget Management](/scaling/budget-management) - per-action token tracking and cost attribution
- [Custom Resources: SwarmRun](/custom-resources/) - run status field reference

---
Expand All @@ -94,7 +94,7 @@ spec:

The operator creates KEDA ScaledObjects automatically. No KEDA YAML to write - just set the fields on your SwarmAgent.

- [Autoscaling (KEDA)](/integrations/autoscaling) - full configuration guide and prerequisites
- [Autoscaling (KEDA)](/scaling/autoscaling) - full configuration guide and prerequisites

---

Expand Down Expand Up @@ -167,7 +167,7 @@ spec:

Budget alerts fire via Slack, email, or webhook before you hit the wall. Per-action token tracking in the audit trail lets you identify which tools and agents drive cost.

- [Budget Management](/advanced/budget-management) - full configuration and enforcement modes
- [Budget Management](/scaling/budget-management) - full configuration and enforcement modes
- [Custom Resources: SwarmBudget](/custom-resources/) - budget field reference

---
Expand Down
46 changes: 46 additions & 0 deletions docs/integrations/artifact-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
sidebar_position: 8
sidebar_label: "Artifact Storage"
description: "kubeswarm artifact storage - S3 and GCS backends for storing and passing file artifacts between agent pipeline steps on Kubernetes."
---

# kubeswarm Artifact Storage - S3 and GCS for Agent Pipelines

kubeswarm SwarmTeam pipelines can store and pass file artifacts between steps using S3 or GCS backends on Kubernetes.

## Supported Backends

| Backend | Endpoint | Auth |
| ------- | ------------------------------ | -------------------------------- |
| **S3** | Any S3-compatible (AWS, MinIO) | Secret with access key |
| **GCS** | Google Cloud Storage | Secret with service account JSON |

## Configuration

```yaml
spec:
artifactStore:
type: s3
s3:
bucket: swarm-artifacts
region: us-east-1
endpoint: http://minio.kubeswarm-system:9000 # omit for AWS S3
credentialsSecret:
name: s3-credentials
```

## Pipeline Usage

Steps declare output artifacts and reference other steps' artifacts:

```yaml
pipeline:
- role: analyst
outputArtifacts:
- name: report.md
contentType: text/markdown
- role: reviewer
dependsOn: [analyst]
inputArtifacts:
report: "{{ .steps.analyst.artifacts.report.md }}"
```
4 changes: 2 additions & 2 deletions docs/integrations/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ kubeswarm connects your Kubernetes agents to external services for LLM inference
- [MCP Servers](./mcp-servers) - Model Context Protocol tool servers
- [Vector Stores](./vector-stores) - Qdrant, Pinecone, Weaviate
- [Notifications](./notifications) - Slack, webhooks
- [Observability](./observability) - OpenTelemetry, Prometheus
- [Autoscaling](./autoscaling) - KEDA
- [Observability](/observability/overview) - OpenTelemetry, Prometheus
- [Autoscaling](/scaling/autoscaling) - KEDA
- [Artifact Storage](./artifact-storage) - S3, GCS
64 changes: 64 additions & 0 deletions docs/integrations/notifications.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
sidebar_position: 6
sidebar_label: "Notifications"
description: "kubeswarm notification integrations - Slack and webhook alerts for agent budget exceeded, degraded and pipeline failure events on Kubernetes."
---

# kubeswarm Notifications - Slack and Webhook Alerts for Agents

kubeswarm sends alerts via the SwarmNotify CRD when agents degrade, budgets are exceeded, or pipeline runs fail on Kubernetes.

## Supported Channels

| Channel | Configuration | Use case |
| ----------- | ----------------------- | --------------------------- |
| **Slack** | Webhook URL from Secret | Team chat alerts |
| **Webhook** | Any HTTP endpoint | PagerDuty, Opsgenie, custom |

## Configuration

```yaml
apiVersion: kubeswarm.io/v1alpha1
kind: SwarmNotify
metadata:
name: ops-alerts
spec:
channel:
type: slack
slack:
webhookUrlSecretRef:
name: slack-secrets
key: webhook-url
events:
- type: BudgetExceeded
template: ":warning: Budget exceeded for {{ .agent }}: {{ .totalTokens }} tokens"
- type: AgentDegraded
template: ":red_circle: Agent degraded: {{ .agent }} - {{ .reason }}"
- type: TeamFailed
template: ":x: Pipeline failed: {{ .team }} run {{ .run }}"
- type: TeamSucceeded
template: ":white_check_mark: Pipeline completed: {{ .team }}"
rateLimiting:
windowSeconds: 300
maxPerWindow: 5
```

## Event Types

| Event | Trigger |
| ---------------- | ----------------------------------------------- |
| `BudgetExceeded` | Daily token limit reached, replicas scaled to 0 |
| `AgentDegraded` | MCP server unreachable or health check failed |
| `TeamFailed` | Pipeline run reached terminal failure |
| `TeamSucceeded` | Pipeline run completed successfully |
| `TeamTimedOut` | Pipeline run exceeded `timeoutSeconds` |

## Referencing from Agents

```yaml
spec:
observability:
healthCheck:
notifyRef:
name: ops-alerts
```
6 changes: 3 additions & 3 deletions docs/observability/audit-trail.md
Original file line number Diff line number Diff line change
Expand Up @@ -435,9 +435,9 @@ The audit trail complements - not replaces - existing observability signals.
For full observability coverage, use the audit trail alongside OTel tracing and structured logging:

- **OTel** for latency analysis and cross-service correlation
- **Structured logging** for runtime debugging (see [Observability](/integrations/observability))
- **Structured logging** for runtime debugging (see [Observability](/observability/overview))
- **Audit trail** for behavior reconstruction, compliance, and cost attribution
- **SwarmBudget** for aggregate spend limits (see [Budget Management](/advanced/budget-management))
- **SwarmBudget** for aggregate spend limits (see [Budget Management](/scaling/budget-management))

---

Expand All @@ -460,7 +460,7 @@ Example: 50 agents, 10 tasks/hour each, `actions` mode, 7-day retention:
- 500 tasks/hour * 5 events * 1.5 KB = 3.75 MB/hour
- 3.75 MB * 168h = 630 MB + 30% headroom = ~820 MB

For detailed sizing - including worked examples for verbose mode, split topologies, and the `maxDetailBytes` knob - see the [Redis in Production](/operations/redis-production#capacity-estimation) guide.
For detailed sizing - including worked examples for verbose mode, split topologies, and the `maxDetailBytes` knob - see the [Redis in Production](/scaling/redis-production#capacity-estimation) guide.

---

Expand Down
Loading