[DPE-8588] Automated tutorial testing by izmalk · Pull Request #487 · canonical/kafka-operator

izmalk · 2026-04-14T12:54:23Z

Description

Created an automated solution for the tutorial end-to-end tests using Spread + Multipass.

The extract_commands.py script parses the existing tutorial content and extracts necessary commands in correct order.
The tutorial files are modified with special comment-blocks, invisible to users, that adjust the flow for automated run when needed.
Spread tasks are also generated from tutorial files.
Spread.yaml to run it either locally (tox -e tutorial) or via CI/CD.

The tests are run in a Multipass VM.
Estimated time from start to finish ~1.5-2 hours.

The output finished like this:

2026-04-14 11:17:17 Allocation results of multipass:ubuntu-24.04-64: map[string]string{}
2026-04-14 11:17:17 Worker terminated.
2026-04-14 11:17:17 Successful tasks: 8
2026-04-14 11:17:17 Aborted tasks: 0
  tutorial: OK (5253.81=setup[0.03]+cmd[0.02,0.03,0.02,0.02,0.02,0.03,0.02,0.02,5253.59] seconds)
  congratulations :) (5253.86 seconds)

https://warthogs.atlassian.net/browse/DPE-8588

I've added TESTING.md with the description of the solution, agents.md to improve AI reliability and updated the README.

Checklist

I have added or updated any relevant documentation.
I have cleaned any remaining cloud resources from my accounts.

Fix tmp folder not accessible due to juju snap isolation. Fix prompting in commands. Fix rebalancing command to call correct juju app.

izmalk · 2026-04-14T15:30:07Z

+wait_idle() {
+    local timeout=600
+    local interval=30
+    local allow_blocked=""
+
+    # Parse named options, consuming two tokens per flag (name + value).
+    while [[ $# -gt 0 ]]; do
+        case "$1" in
+            --timeout)       timeout="$2";       shift 2 ;;
+            --interval)      interval="$2";      shift 2 ;;
+            --allow-blocked) allow_blocked="$2"; shift 2 ;;
+            *) echo "wait_idle: unknown option: $1" >&2; return 1 ;;
+        esac
+    done
+
+    local elapsed=0
+    echo "Waiting for all Juju units to be active/idle (timeout=${timeout}s, poll=${interval}s)…"
+
+    while [[ "$elapsed" -lt "$timeout" ]]; do
+        local not_ready
+        # Run the poll pipeline with pipefail disabled so a non-zero exit from
+        # "juju status" (common while machines are still provisioning) does not
+        # abort a calling script that has  set -euo pipefail  active.
+        not_ready=$(
+            set +o pipefail
+            export ALLOW_BLOCKED="$allow_blocked"
+            juju status --format=json 2>/dev/null | python3 -c '
+import json, sys, os
+try:
+    data = json.load(sys.stdin)
+    allowed = set(os.environ.get("ALLOW_BLOCKED", "").split(",")) - {""}
+    not_ready = 0
+    total_units = 0
+    for app_name, app in data.get("applications", {}).items():
+        for unit in app.get("units", {}).values():
+            total_units += 1
+            ws = unit.get("workload-status", {}).get("current", "")
+            js = unit.get("juju-status",    {}).get("current", "")
+            if ws == "active" and js == "idle":
+                continue
+            if ws == "blocked" and js == "idle" and app_name in allowed:
+                continue
+            not_ready += 1
+    if total_units == 0:
+        print("provisioning")
+    else:
+        print(not_ready)
+except Exception:
+    print("provisioning")
+'
+        ) || not_ready="provisioning"
+
+        if [[ "$not_ready" == "0" ]]; then
+            echo "All units active/idle after ${elapsed}s."
+            juju status
+            return 0
+        elif [[ "$not_ready" == "provisioning" ]]; then
+            echo "[${elapsed}s elapsed] still provisioning – rechecking in ${interval}s…"
+        else
+            echo "[${elapsed}s elapsed] ${not_ready} unit(s) not yet active/idle – rechecking in ${interval}s…"
+        fi
+        sleep "$interval"
+        elapsed=$(( elapsed + interval ))
+    done
+
+    echo "Timed out after ${timeout}s. Final status:"
+    juju status
+    return 1
+}


This is one of the most important features: the ability to test the Juju model and detect its active/idle state. We rely on it frequently, so its reliability and configuration directly affect both the reliability of the tests and their overall duration.

I experimented with replacing it with juju wait-for, but it was far too unstable. It often failed to detect the active/idle state at all, timed out, and then, when I checked manually, the status was actually already active/idle. It simply did not detect it.

There is probably a simple reason for this: juju wait-for works differently from juju status (see source):

The wait-for command streams delta changes from the underlying database, unlike the status command which performs a full query of the database.

I think your observation needs more investigation (and maybe reporting upstream) since we rely on wait-for in almost all our integration tests.

Anyway, as I mentioned in the comments, we could achieve this much simpler in Python, either using jubilant.Juju.wait(), or if you don't trust that, using a simple retry & polling mechanism in Python:

for attempt in tenacity.Retrying( wait=tenacity.wait_fixed(10), stop=tenacity.stop_after_delay(600), reraise=True, ): with attempt: status = juju.status() assert all( app.app_status.current == "active" and app.juju_status.current == "idle" for app in status.apps.values() )

I agree. Do we have examples of the instability? It's the same logic we're using in tests so I don't think it's unstable.

Here is the log output.

Sorry it's long; wait-for spams model "tutorial" found, waiting... quite often.
Search for:

ERROR timed out waiting for "tutorial" to reach goal state

Basically, the juju wait-for model runs until it reaches a timeout. I've tried making the maximum time rather big; it didn't help. And when I added a status check after the wait-for, it showed active/idle. It even worked slower, but that's most likely due to reaching timeouts (taking maximum time). So, after many attempts of trying to make it work, I dropped the idea and reverted to the well-working solution for now.

After discussing the jubilant approach with Iman, I feel committed to implementing alternative functions. To have all three options so we can test and compare them.

imanenami

First of all, I really liked the idea of using inline HTML metadata for test automation. Good job on that 👍

Before doing a detailed review, I need some input for my own clarification so that I don't spam the review with generic comments:

While I like the HTML inline comments idea, why are we inventing another mini-testing-language here? Since jubilant is the current charm eng. standard, I think we can safely use that with leaner syntax, and remove many of the "test compiling" logic in this PR.
With the current implementation, I have a more general concern about raising the bar yet higher than it is now for future documentation contributions. Specially with the brand new test DSL (Domain Specific Language) & inline shell commands now required to pass the docs tests...

imanenami · 2026-04-15T07:35:13Z

question: why are we using a bash script here? we could write this in much more readable & maintainable form in python 🤔

I agree, I don't think that we should be encapsulating this logic here.

Summarising the dicussion we had - yes, Python looks much nicer and easier to maintain. But our tutorial tells our users to run bash commands and I'd like to keep it as close as possible to the tutorial for now, as that is the objective.

And to recognize the valid concern about shell script implementation being rather ugly and hardly maintainable, I already implemented a jubilant alternative to the wait_idle() in helpers. I'm planning to implement the juju wait-for alternative too, as soon as I can make it actually work =D. You can see the jubilant version in this other branch. But it will be merged a bit later, as version, let's say 0.2, of the solution, where I will specifically test and experiment which implementation works better and why. For now, I'd like to settle with old-reliable shell script since it has shown best results so far.

imanenami · 2026-04-15T07:38:05Z

+[testenv:tutorial-extract]
+description = Generate tutorial test scripts from Markdown sources
+skip_install = True
+commands =
+    python3 tests/tutorial/extract_commands.py docs/tutorial/environment.md tests/tutorial/01_environment.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/deploy.md tests/tutorial/02_deploy.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/integrate-with-client-applications.md tests/tutorial/03_client.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/manage-passwords.md tests/tutorial/04_passwords.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/enable-encryption.md tests/tutorial/05_encryption.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/use-kafka-connect.md tests/tutorial/06_kafka_connect.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/rebalance-partitions.md tests/tutorial/07_rebalance.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/cleanup.md tests/tutorial/08_cleanup.sh
+


todo: I'd highly suggest to keep the toxfile lean and move all this into a separate script.

We can probably have a bash one-liner to list files and run them with xargs or something in sorted order

Fixed in 365ade9

imanenami · 2026-04-15T07:38:26Z

+commands =
+    python3 tests/tutorial/extract_commands.py docs/tutorial/environment.md tests/tutorial/01_environment.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/deploy.md tests/tutorial/02_deploy.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/integrate-with-client-applications.md tests/tutorial/03_client.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/manage-passwords.md tests/tutorial/04_passwords.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/enable-encryption.md tests/tutorial/05_encryption.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/use-kafka-connect.md tests/tutorial/06_kafka_connect.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/rebalance-partitions.md tests/tutorial/07_rebalance.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/cleanup.md tests/tutorial/08_cleanup.sh
+    spread -vv multipass:ubuntu-24.04-64:tests/tutorial/
+


same comment as above

Fixed in 365ade9

imanenami · 2026-04-15T07:40:10Z

+Commands are extracted **only** from `` ```shell `` fenced blocks.
+Use `` ```bash `` or `` ```text `` for output examples or commands that should
+not be executed by the test harness.


I think we should only use text for command outputs and not recommend using bash. after all, bash is a form of shell.

I’m not sure I understand. This tells the agent that shell blocks are extracted, while other block types are not. text and bash are examples: bash is for commands, and text is for output (or any other text that is not syntax higlighted).

Outside of the extractor script I built, bash and shell code blocks are functionally identical, as far as I know.

marcoppenheimer · 2026-04-15T12:17:29Z

+<!-- test:run
+# Cruise Control needs time to collect metrics from the cluster.  Rather than
+# a fixed sleep, retry the dryrun rebalance until CC reports ready (up to 40
+# minutes, polling every 2 minutes).
+elapsed=0
+timeout=2400
+interval=120
+echo "Waiting for Cruise Control to be ready (timeout=${timeout}s, poll=${interval}s)…"
+while [ "$elapsed" -lt "$timeout" ]; do
+  if juju run kraft/leader rebalance mode=add brokerid=103 --wait=2m 2>&1; then
+    echo "Cruise Control ready after ${elapsed}s."
+    break
+  fi
+  echo "[${elapsed}s elapsed] Cruise Control not ready – retrying in ${interval}s…"
+  sleep "$interval"
+  elapsed=$((elapsed + interval))
+done
+if [ "$elapsed" -ge "$timeout" ]; then
+  echo "ERROR: Cruise Control did not become ready within ${timeout}s"
+  exit 1
+fi
+-->


todo: This is a no-go, it's far too complex. We need a better way of abstracting this to make contributions more streamlined.

We had a great discussion about this. I'll be implementing a helper.sh function to make invoking such command rather trivial. The most direct approach (to keep it close to the tutorial instructions) - is to call it every 20 minutes or so for ~3 times. The exact command will be here, the rest - automated in the helper.sh file.

marcoppenheimer · 2026-04-15T12:17:46Z

+<!-- test:run
+# CC may need time to recollect metrics after the topology change.  Retry the
+# dryrun until it succeeds (up to 20 minutes, polling every 2 minutes).
+elapsed=0
+timeout=1200
+interval=120
+echo "Waiting for Cruise Control to be ready for full rebalance (timeout=${timeout}s)…"
+while [ "$elapsed" -lt "$timeout" ]; do
+  if juju run kraft/leader rebalance mode=full --wait=3m 2>&1; then
+    echo "Full rebalance dryrun succeeded after ${elapsed}s."
+    break
+  fi
+  echo "[${elapsed}s elapsed] Not ready – retrying in ${interval}s…"
+  sleep "$interval"
+  elapsed=$((elapsed + interval))
+done
+if [ "$elapsed" -ge "$timeout" ]; then
+  echo "ERROR: Cruise Control full rebalance dryrun did not succeed within ${timeout}s"
+  exit 1
+fi
+-->


**todo: Same as above.

marcoppenheimer · 2026-04-15T12:19:22Z

I agree, I don't think that we should be encapsulating this logic here.

marcoppenheimer · 2026-04-15T12:20:07Z

+wait_idle() {
+    local timeout=600
+    local interval=30
+    local allow_blocked=""
+
+    # Parse named options, consuming two tokens per flag (name + value).
+    while [[ $# -gt 0 ]]; do
+        case "$1" in
+            --timeout)       timeout="$2";       shift 2 ;;
+            --interval)      interval="$2";      shift 2 ;;
+            --allow-blocked) allow_blocked="$2"; shift 2 ;;
+            *) echo "wait_idle: unknown option: $1" >&2; return 1 ;;
+        esac
+    done
+
+    local elapsed=0
+    echo "Waiting for all Juju units to be active/idle (timeout=${timeout}s, poll=${interval}s)…"
+
+    while [[ "$elapsed" -lt "$timeout" ]]; do
+        local not_ready
+        # Run the poll pipeline with pipefail disabled so a non-zero exit from
+        # "juju status" (common while machines are still provisioning) does not
+        # abort a calling script that has  set -euo pipefail  active.
+        not_ready=$(
+            set +o pipefail
+            export ALLOW_BLOCKED="$allow_blocked"
+            juju status --format=json 2>/dev/null | python3 -c '
+import json, sys, os
+try:
+    data = json.load(sys.stdin)
+    allowed = set(os.environ.get("ALLOW_BLOCKED", "").split(",")) - {""}
+    not_ready = 0
+    total_units = 0
+    for app_name, app in data.get("applications", {}).items():
+        for unit in app.get("units", {}).values():
+            total_units += 1
+            ws = unit.get("workload-status", {}).get("current", "")
+            js = unit.get("juju-status",    {}).get("current", "")
+            if ws == "active" and js == "idle":
+                continue
+            if ws == "blocked" and js == "idle" and app_name in allowed:
+                continue
+            not_ready += 1
+    if total_units == 0:
+        print("provisioning")
+    else:
+        print(not_ready)
+except Exception:
+    print("provisioning")
+'
+        ) || not_ready="provisioning"
+
+        if [[ "$not_ready" == "0" ]]; then
+            echo "All units active/idle after ${elapsed}s."
+            juju status
+            return 0
+        elif [[ "$not_ready" == "provisioning" ]]; then
+            echo "[${elapsed}s elapsed] still provisioning – rechecking in ${interval}s…"
+        else
+            echo "[${elapsed}s elapsed] ${not_ready} unit(s) not yet active/idle – rechecking in ${interval}s…"
+        fi
+        sleep "$interval"
+        elapsed=$(( elapsed + interval ))
+    done
+
+    echo "Timed out after ${timeout}s. Final status:"
+    juju status
+    return 1
+}


I agree. Do we have examples of the instability? It's the same logic we're using in tests so I don't think it's unstable.

marcoppenheimer · 2026-04-15T12:20:44Z

+[testenv:tutorial-extract]
+description = Generate tutorial test scripts from Markdown sources
+skip_install = True
+commands =
+    python3 tests/tutorial/extract_commands.py docs/tutorial/environment.md tests/tutorial/01_environment.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/deploy.md tests/tutorial/02_deploy.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/integrate-with-client-applications.md tests/tutorial/03_client.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/manage-passwords.md tests/tutorial/04_passwords.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/enable-encryption.md tests/tutorial/05_encryption.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/use-kafka-connect.md tests/tutorial/06_kafka_connect.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/rebalance-partitions.md tests/tutorial/07_rebalance.sh
+    python3 tests/tutorial/extract_commands.py docs/tutorial/cleanup.md tests/tutorial/08_cleanup.sh
+


We can probably have a bash one-liner to list files and run them with xargs or something in sorted order

imanenami · 2026-04-16T06:06:12Z

I had a productive discussion yesterday with @izmalk and it helped clarify some of the underlying decisions made in the PR. Posting my summary here for future reference:

Because tutorial tests sit at the topmost of the testing pyramid, it's justifiable that we use the same tools available to normal users for the tests. i.e. bash commands in this case. Although it'd be also desirable to add options for other flavors of testing. Now that I think, magics in IPython (jupyter) can be a good source of inspiration. For example if I put %%jubilant in a test annotation, it means I'm using jubilant, same for bash, python and other usable testing flavors.
Vladimir mentioned some complex bash commands in test blocks of this PR is for demonstrative purposes. That also alleviates my concern about raising the skills bar for documentation work (writing tutorials).

imanenami

I can break this PR into three pieces:

The testing syntax (we can call it the Frontend): This involves the Annotation reference section in the TESTING.md file. This is the part that I care most about at current stage, and I think needs to be set right from ground up, to be reusable across projects. This involves the HTML inline comments & the proposed syntax.
The test compiler (or we can call it the Backend): This involves the extract_commands.py, helpers.sh, and the make utility. I'm not going to be nitpicky at this stage, I believe we can improve this in the future once we agree on the testing syntax, and ideally imo, this should be in pure, type-hinted, quality-checked Python so that it can be maintainable by Python developers :)
The implementation of the syntax + compiler for Kafka tutorials: This is already good in my opinion 👍

With that mental model, the only major change I want to see in this PR before approving is the one that Vladimir already mentioned, and I reiterated in my previous comment, about the possibility of adding different flavors of test assertions using a syntax, which can for example be inspired from IPython magics.

izmalk added 30 commits April 14, 2026 00:17

Tests prototype

a4b701a

First successful run

fc2ace6

Updates to make it more resilient

d2091f3

Add 02_deploy tests

111945c

Update helpers.sh

f7117c9

Add yq installation

0b2d192

Shortened deploy version

556c00c

Add juju secret commands to deploy stage

7e95cf5

Fix no units provisioned case, improve output

5b80cd0

Add testing for topic create, list, and delete

dfea208

Split tutorial stages

6d04bfb

Update conf.py to exclude internal files from rendering

c997621

Update tests directory and title

128cff9

Add tutorial stage 3 - first try

ef93f3d

Update and review with script-regeneration

9946dec

Update order of execution

12794d4

Streamline step 3 of the tutorial

11e0ac0

Replace placeholders with parsed credentials

a014c59

Update executable scripts in step 3

f896104

Regenerate test scripts, update makefile with comments

ddf2c6f

Add no prompt option to the remove-application

fd96b2d

Fix an error on stage 3 with broken relation

479c9cf

Update TESTING.md

962a0c6

Add stages 4, 5, 6, 7, and 8 (first try)

7ce4251

Fix linter warnings

d4f3e04

Fixing command scripts

60bc2ef

Fixing juju wait to process blocked state better

6f7e597

Fix allow_blocked state option logic

3cc4def

Fix minor issues

efd3d56

Fix tmp folder not accessible due to juju snap isolation. Fix prompting in commands. Fix rebalancing command to call correct juju app.

Use different syntax to fix a script

3553351

izmalk and others added 10 commits April 14, 2026 00:21

Add wait(sleep) to fix a later timeout

70874ed

Migrate from juju wait-for

e9a7db3

Add helpers.sh

2102c6d

Delete wrong assert

f6e1d44

Add tox

173dcb6

Update tox.ini to continue tests on fail

00b3e9d

Add CI GH workflow

ee3dbae

Slimify spread.yaml

9a42a85

More shrinkage

93694d3

Make rebalancer test more resilient

e684357

izmalk self-assigned this Apr 14, 2026

izmalk added documentation Improvements or additions to documentation enhancement New feature, UI change, or workload upgrade labels Apr 14, 2026

izmalk mentioned this pull request Apr 14, 2026

[Experimental] Tutorial auto tests #474

Closed

2 tasks

Update TESTING.md

6da4c6f

marcoppenheimer requested review from imanenami, marcoppenheimer and zmraul April 14, 2026 14:47

Improved special comments docs

cba37c5

izmalk changed the title ~~Automated tutorial testing~~ [DPE-8588] Automated tutorial testing Apr 14, 2026

izmalk commented Apr 14, 2026

View reviewed changes

imanenami reviewed Apr 15, 2026

View reviewed changes

marcoppenheimer requested changes Apr 15, 2026

View reviewed changes

imanenami reviewed Apr 16, 2026

View reviewed changes

izmalk added 5 commits April 23, 2026 14:21

Update tox targets

c50fb6f

Tox target dependency fix

fec2433

Auto detect tutorial pages

365ade9

Implement retry helper

cfd2124

Fix broken link

7e1e76a

Conversation

izmalk commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

izmalk Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

izmalk Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imanenami left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imanenami Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

izmalk Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imanenami commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imanenami left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

izmalk commented Apr 14, 2026 •

edited

Loading

izmalk Apr 14, 2026 •

edited

Loading

izmalk Apr 15, 2026 •

edited

Loading

imanenami left a comment •

edited

Loading

imanenami Apr 15, 2026 •

edited

Loading

izmalk Apr 20, 2026 •

edited

Loading

imanenami commented Apr 16, 2026 •

edited

Loading