Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 73 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,44 @@
# Fint Kafkarator

Fint Kafkarator is an operator that creates a service user and ACL in Aiven for Kafka.
Username, password, ACL id and access certificate and -key will be stored in kubernetes secrets
Fint Kafkarator is a Kubernetes operator that provisions Kafka service users and ACLs in Aiven, and publishes the client configuration and certificate material as Kubernetes secrets.

Runtime stack:

- Spring Boot `3.5.x`
- Java `25`
- Gradle `9.4.x`

## What does the operator do?

When a `KafkaUserAndAcl` CR is **created**:
* The operator will create a service user and ACL in Aiven.
* Username, password and ACL id will be generated and stored in secrets along with access certificate and -key.
When a `KafkaUserAndAcl` resource is created:
- The operator creates a service user and ACLs in Aiven.
- The operator creates a `-kafka` secret with Spring Kafka SSL configuration.
- The operator creates a `-kafka-certificates` secret with `client.keystore.p12` and `client.truststore.jks`.

When a `KafkaUserAndAcl` resource is deleted:
- The operator deletes the user and ACLs from Aiven.
- The operator deletes the managed secrets from Kubernetes.

When an existing certificate secret is reconciled:
- The operator inspects the current client certificate expiry date.
- The operator rotates the keystore and truststore if the certificate is missing, unreadable, or inside the configured rotation threshold.
- The operator annotates the secret with the observed certificate expiry and last rotation time.

## Operational Improvements

Operationally relevant improvements:

When a `KafkaUserAndAcl` CR is **deleted**:
* The operator will delete the user and ACL from Aiven.
* The operator will delete the secrets from Kubernetes.
- Expiry-aware certificate handling instead of only verifying that the keystore can be opened.
- Configurable certificate rotation threshold via `fint.aiven.certificate-rotation-threshold`.
- Prometheus metrics for certificate expiry, rotation pressure, inspections, rotations and reconcile duration.
- Grafana/PromQL documentation for dashboards and alerting.

## How to use the operator:
See:

- [PromQL examples](docs/metrics-promql.md)
- [Grafana dashboard JSON](docs/kafkarator-grafana-dashboard.json)

## Custom Resource

### KafkaUserAndAcl
```yaml
Expand Down Expand Up @@ -56,9 +81,43 @@ spec:
topic: '*sample-test2'
```

#### Prerequisites
* Aiven account, project and service
* Aiven token and Aiven api base url in application.yaml
## Prerequisites

- Aiven account, project and service
- Aiven token and Aiven API base URL in `application.yaml`

## Configuration

Relevant application properties:

### Using the operator
TODO
```yaml
fint:
aiven:
base-url: https://api.aiven.io/v1
project: fintlabs
service: kafka-alpha
kafka-bootstrap-servers: broker-1:9092,broker-2:9092
certificate-rotation-threshold: 30d
```

## Metrics

Kafkarator exposes Prometheus metrics on `/actuator/prometheus`.

Key metrics:

- `kafkarator_certificate_expiry_seconds`
- `kafkarator_certificate_days_until_expiry`
- `kafkarator_certificate_rotation_due`
- `kafkarator_certificate_oldest_days_until_expiry`
- `kafkarator_certificate_inspections_total`
- `kafkarator_certificate_rotations_total`
- `kafkarator_certificate_secret_reconcile_duration_seconds`

## Building And Testing

Run the full test suite:

```bash
./gradlew test
```
2 changes: 2 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ repositories {
}

dependencies {
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'org.springframework.boot:spring-boot-starter-webflux'
runtimeOnly 'io.micrometer:micrometer-registry-prometheus'

implementation 'no.fintlabs:flais-operator-starter:1.0.0'

Expand Down
100 changes: 100 additions & 0 deletions docs/certificate-rotation-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Certificate Rotation Behavior

This document describes how Kafkarator handles existing certificate secrets, expiry-aware rotation, and metadata updates.

## Reconcile Decision Flow

```mermaid
flowchart TD
A["KafkaUserAndAcl reconcile starts"] --> B["Load KafkaUserAndAcl secondary resource"]
B --> C["Load <name>-kafka secret"]
C --> D["Read key store password and trust store password"]
D --> E["Load existing <name>-kafka-certificates secret if present"]
E --> F{"Existing client.keystore.p12 present?"}

F -- "No" --> G["Mark rotation required"]
F -- "Yes" --> H["Inspect keystore and extract leaf certificate notAfter"]

H --> I{"Keystore readable?"}
I -- "No" --> G
I -- "Yes" --> J{"Certificate expires within rotation threshold?"}
J -- "Yes" --> G
J -- "No" --> K["Reuse existing keystore"]

G --> L["Generate new keystore from Aiven access_cert/access_key and CA"]
L --> M["Generate new truststore from Aiven CA"]

K --> N{"Existing truststore reusable?"}
N -- "Yes" --> O["Reuse existing truststore"]
N -- "No" --> P["Generate new truststore from Aiven CA"]

M --> Q["Inspect resulting keystore"]
O --> Q
P --> Q

Q --> R["Update annotations"]
R --> R1["Set certificate-not-after"]
R1 --> R2{"Did rotation happen?"}
R2 -- "Yes" --> R3["Set last-rotated-at"]
R2 -- "No" --> S["Keep existing last-rotated-at as-is"]
R3 --> T["Write <name>-kafka-certificates secret"]
S --> T

T --> U["Publish metrics for inspection, rotation, and reconcile duration"]
U --> V["Reconcile completes"]
```

## Existing Secret Adoption After Deploy

This diagram shows what happens after deploying the new Kafkarator version into a cluster with existing secrets that do not yet have the new annotations.

```mermaid
sequenceDiagram
participant O as "Kafkarator"
participant CR as "KafkaUserAndAcl"
participant KS as "<name>-kafka Secret"
participant CS as "<name>-kafka-certificates Secret"
participant A as "Aiven"
participant M as "Prometheus Metrics"

O->>CR: Reconcile custom resource
O->>KS: Read Kafka SSL passwords
O->>CS: Read existing keystore/truststore and annotations

alt "No certificate annotations on existing secret"
O->>CS: Inspect client.keystore.p12
alt "Keystore readable and cert outside threshold"
O->>CS: Patch secret metadata with certificate-not-after
Note over O,CS: last-rotated-at remains absent until an actual rotation happens
else "Keystore unreadable, expired, missing, or inside threshold"
O->>A: Use current Aiven credentials and CA
O->>CS: Regenerate keystore and truststore
O->>CS: Write certificate-not-after and last-rotated-at
end
else "Annotations already present"
O->>CS: Re-evaluate actual keystore state
Note over O,CS: annotations are informative, not the source of truth
end

O->>M: Publish inspection counters and resource gauges
O->>M: Publish rotation counters if rotation was attempted
O->>M: Publish reconcile duration
```

## Backward Compatibility Summary

```mermaid
flowchart LR
A["Existing secret without annotations"] --> B["Secret is still accepted"]
B --> C["Kafkarator inspects actual keystore content"]
C --> D{"Healthy certificate?"}
D -- "Yes" --> E["Backfill certificate-not-after annotation only"]
D -- "No" --> F["Rotate keystore/truststore and write both annotations"]
```

## Notes

- The annotations are derived metadata, not required input.
- Missing annotations do not break reconcile.
- The actual keystore content remains the source of truth.
- A large number of old secrets may be patched shortly after rollout, either to backfill annotations or to rotate certificates that are already due.
Loading