[DPE-8588] Automated tutorial testing#487
Conversation
Fix tmp folder not accessible due to juju snap isolation. Fix prompting in commands. Fix rebalancing command to call correct juju app.
| wait_idle() { | ||
| local timeout=600 | ||
| local interval=30 | ||
| local allow_blocked="" | ||
|
|
||
| # Parse named options, consuming two tokens per flag (name + value). | ||
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --timeout) timeout="$2"; shift 2 ;; | ||
| --interval) interval="$2"; shift 2 ;; | ||
| --allow-blocked) allow_blocked="$2"; shift 2 ;; | ||
| *) echo "wait_idle: unknown option: $1" >&2; return 1 ;; | ||
| esac | ||
| done | ||
|
|
||
| local elapsed=0 | ||
| echo "Waiting for all Juju units to be active/idle (timeout=${timeout}s, poll=${interval}s)…" | ||
|
|
||
| while [[ "$elapsed" -lt "$timeout" ]]; do | ||
| local not_ready | ||
| # Run the poll pipeline with pipefail disabled so a non-zero exit from | ||
| # "juju status" (common while machines are still provisioning) does not | ||
| # abort a calling script that has set -euo pipefail active. | ||
| not_ready=$( | ||
| set +o pipefail | ||
| export ALLOW_BLOCKED="$allow_blocked" | ||
| juju status --format=json 2>/dev/null | python3 -c ' | ||
| import json, sys, os | ||
| try: | ||
| data = json.load(sys.stdin) | ||
| allowed = set(os.environ.get("ALLOW_BLOCKED", "").split(",")) - {""} | ||
| not_ready = 0 | ||
| total_units = 0 | ||
| for app_name, app in data.get("applications", {}).items(): | ||
| for unit in app.get("units", {}).values(): | ||
| total_units += 1 | ||
| ws = unit.get("workload-status", {}).get("current", "") | ||
| js = unit.get("juju-status", {}).get("current", "") | ||
| if ws == "active" and js == "idle": | ||
| continue | ||
| if ws == "blocked" and js == "idle" and app_name in allowed: | ||
| continue | ||
| not_ready += 1 | ||
| if total_units == 0: | ||
| print("provisioning") | ||
| else: | ||
| print(not_ready) | ||
| except Exception: | ||
| print("provisioning") | ||
| ' | ||
| ) || not_ready="provisioning" | ||
|
|
||
| if [[ "$not_ready" == "0" ]]; then | ||
| echo "All units active/idle after ${elapsed}s." | ||
| juju status | ||
| return 0 | ||
| elif [[ "$not_ready" == "provisioning" ]]; then | ||
| echo "[${elapsed}s elapsed] still provisioning – rechecking in ${interval}s…" | ||
| else | ||
| echo "[${elapsed}s elapsed] ${not_ready} unit(s) not yet active/idle – rechecking in ${interval}s…" | ||
| fi | ||
| sleep "$interval" | ||
| elapsed=$(( elapsed + interval )) | ||
| done | ||
|
|
||
| echo "Timed out after ${timeout}s. Final status:" | ||
| juju status | ||
| return 1 | ||
| } |
There was a problem hiding this comment.
This is one of the most important features: the ability to test the Juju model and detect its active/idle state. We rely on it frequently, so its reliability and configuration directly affect both the reliability of the tests and their overall duration.
I experimented with replacing it with juju wait-for, but it was far too unstable. It often failed to detect the active/idle state at all, timed out, and then, when I checked manually, the status was actually already active/idle. It simply did not detect it.
There is probably a simple reason for this: juju wait-for works differently from juju status (see source):
The wait-for command streams delta changes from the underlying database, unlike the status command which performs a full query of the database.
There was a problem hiding this comment.
I think your observation needs more investigation (and maybe reporting upstream) since we rely on wait-for in almost all our integration tests.
Anyway, as I mentioned in the comments, we could achieve this much simpler in Python, either using jubilant.Juju.wait(), or if you don't trust that, using a simple retry & polling mechanism in Python:
for attempt in tenacity.Retrying(
wait=tenacity.wait_fixed(10),
stop=tenacity.stop_after_delay(600),
reraise=True,
):
with attempt:
status = juju.status()
assert all(
app.app_status.current == "active" and app.juju_status.current == "idle"
for app in status.apps.values()
)There was a problem hiding this comment.
I agree. Do we have examples of the instability? It's the same logic we're using in tests so I don't think it's unstable.
There was a problem hiding this comment.
Here is the log output.
Sorry it's long; wait-for spams model "tutorial" found, waiting... quite often.
Search for:
ERROR timed out waiting for "tutorial" to reach goal state
Basically, the juju wait-for model runs until it reaches a timeout. I've tried making the maximum time rather big; it didn't help. And when I added a status check after the wait-for, it showed active/idle. It even worked slower, but that's most likely due to reaching timeouts (taking maximum time). So, after many attempts of trying to make it work, I dropped the idea and reverted to the well-working solution for now.
After discussing the jubilant approach with Iman, I feel committed to implementing alternative functions. To have all three options so we can test and compare them.
There was a problem hiding this comment.
First of all, I really liked the idea of using inline HTML metadata for test automation. Good job on that 👍
Before doing a detailed review, I need some input for my own clarification so that I don't spam the review with generic comments:
-
While I like the HTML inline comments idea, why are we inventing another mini-testing-language here? Since
jubilantis the current charm eng. standard, I think we can safely use that with leaner syntax, and remove many of the "test compiling" logic in this PR. -
With the current implementation, I have a more general concern about raising the bar yet higher than it is now for future documentation contributions. Specially with the brand new test DSL (Domain Specific Language) & inline shell commands now required to pass the docs tests...
There was a problem hiding this comment.
question: why are we using a bash script here? we could write this in much more readable & maintainable form in python 🤔
There was a problem hiding this comment.
I agree, I don't think that we should be encapsulating this logic here.
There was a problem hiding this comment.
Summarising the dicussion we had - yes, Python looks much nicer and easier to maintain. But our tutorial tells our users to run bash commands and I'd like to keep it as close as possible to the tutorial for now, as that is the objective.
And to recognize the valid concern about shell script implementation being rather ugly and hardly maintainable, I already implemented a jubilant alternative to the wait_idle() in helpers. I'm planning to implement the juju wait-for alternative too, as soon as I can make it actually work =D. You can see the jubilant version in this other branch. But it will be merged a bit later, as version, let's say 0.2, of the solution, where I will specifically test and experiment which implementation works better and why. For now, I'd like to settle with old-reliable shell script since it has shown best results so far.
| [testenv:tutorial-extract] | ||
| description = Generate tutorial test scripts from Markdown sources | ||
| skip_install = True | ||
| commands = | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/environment.md tests/tutorial/01_environment.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/deploy.md tests/tutorial/02_deploy.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/integrate-with-client-applications.md tests/tutorial/03_client.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/manage-passwords.md tests/tutorial/04_passwords.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/enable-encryption.md tests/tutorial/05_encryption.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/use-kafka-connect.md tests/tutorial/06_kafka_connect.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/rebalance-partitions.md tests/tutorial/07_rebalance.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/cleanup.md tests/tutorial/08_cleanup.sh | ||
|
|
There was a problem hiding this comment.
todo: I'd highly suggest to keep the toxfile lean and move all this into a separate script.
There was a problem hiding this comment.
We can probably have a bash one-liner to list files and run them with xargs or something in sorted order
| commands = | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/environment.md tests/tutorial/01_environment.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/deploy.md tests/tutorial/02_deploy.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/integrate-with-client-applications.md tests/tutorial/03_client.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/manage-passwords.md tests/tutorial/04_passwords.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/enable-encryption.md tests/tutorial/05_encryption.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/use-kafka-connect.md tests/tutorial/06_kafka_connect.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/rebalance-partitions.md tests/tutorial/07_rebalance.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/cleanup.md tests/tutorial/08_cleanup.sh | ||
| spread -vv multipass:ubuntu-24.04-64:tests/tutorial/ | ||
|
|
| Commands are extracted **only** from `` ```shell `` fenced blocks. | ||
| Use `` ```bash `` or `` ```text `` for output examples or commands that should | ||
| not be executed by the test harness. |
There was a problem hiding this comment.
I think we should only use text for command outputs and not recommend using bash. after all, bash is a form of shell.
There was a problem hiding this comment.
I’m not sure I understand. This tells the agent that shell blocks are extracted, while other block types are not. text and bash are examples: bash is for commands, and text is for output (or any other text that is not syntax higlighted).
Outside of the extractor script I built, bash and shell code blocks are functionally identical, as far as I know.
| <!-- test:run | ||
| # Cruise Control needs time to collect metrics from the cluster. Rather than | ||
| # a fixed sleep, retry the dryrun rebalance until CC reports ready (up to 40 | ||
| # minutes, polling every 2 minutes). | ||
| elapsed=0 | ||
| timeout=2400 | ||
| interval=120 | ||
| echo "Waiting for Cruise Control to be ready (timeout=${timeout}s, poll=${interval}s)…" | ||
| while [ "$elapsed" -lt "$timeout" ]; do | ||
| if juju run kraft/leader rebalance mode=add brokerid=103 --wait=2m 2>&1; then | ||
| echo "Cruise Control ready after ${elapsed}s." | ||
| break | ||
| fi | ||
| echo "[${elapsed}s elapsed] Cruise Control not ready – retrying in ${interval}s…" | ||
| sleep "$interval" | ||
| elapsed=$((elapsed + interval)) | ||
| done | ||
| if [ "$elapsed" -ge "$timeout" ]; then | ||
| echo "ERROR: Cruise Control did not become ready within ${timeout}s" | ||
| exit 1 | ||
| fi | ||
| --> |
There was a problem hiding this comment.
todo: This is a no-go, it's far too complex. We need a better way of abstracting this to make contributions more streamlined.
There was a problem hiding this comment.
We had a great discussion about this. I'll be implementing a helper.sh function to make invoking such command rather trivial. The most direct approach (to keep it close to the tutorial instructions) - is to call it every 20 minutes or so for ~3 times. The exact command will be here, the rest - automated in the helper.sh file.
| <!-- test:run | ||
| # CC may need time to recollect metrics after the topology change. Retry the | ||
| # dryrun until it succeeds (up to 20 minutes, polling every 2 minutes). | ||
| elapsed=0 | ||
| timeout=1200 | ||
| interval=120 | ||
| echo "Waiting for Cruise Control to be ready for full rebalance (timeout=${timeout}s)…" | ||
| while [ "$elapsed" -lt "$timeout" ]; do | ||
| if juju run kraft/leader rebalance mode=full --wait=3m 2>&1; then | ||
| echo "Full rebalance dryrun succeeded after ${elapsed}s." | ||
| break | ||
| fi | ||
| echo "[${elapsed}s elapsed] Not ready – retrying in ${interval}s…" | ||
| sleep "$interval" | ||
| elapsed=$((elapsed + interval)) | ||
| done | ||
| if [ "$elapsed" -ge "$timeout" ]; then | ||
| echo "ERROR: Cruise Control full rebalance dryrun did not succeed within ${timeout}s" | ||
| exit 1 | ||
| fi | ||
| --> |
There was a problem hiding this comment.
**todo: Same as above.
There was a problem hiding this comment.
I agree, I don't think that we should be encapsulating this logic here.
| wait_idle() { | ||
| local timeout=600 | ||
| local interval=30 | ||
| local allow_blocked="" | ||
|
|
||
| # Parse named options, consuming two tokens per flag (name + value). | ||
| while [[ $# -gt 0 ]]; do | ||
| case "$1" in | ||
| --timeout) timeout="$2"; shift 2 ;; | ||
| --interval) interval="$2"; shift 2 ;; | ||
| --allow-blocked) allow_blocked="$2"; shift 2 ;; | ||
| *) echo "wait_idle: unknown option: $1" >&2; return 1 ;; | ||
| esac | ||
| done | ||
|
|
||
| local elapsed=0 | ||
| echo "Waiting for all Juju units to be active/idle (timeout=${timeout}s, poll=${interval}s)…" | ||
|
|
||
| while [[ "$elapsed" -lt "$timeout" ]]; do | ||
| local not_ready | ||
| # Run the poll pipeline with pipefail disabled so a non-zero exit from | ||
| # "juju status" (common while machines are still provisioning) does not | ||
| # abort a calling script that has set -euo pipefail active. | ||
| not_ready=$( | ||
| set +o pipefail | ||
| export ALLOW_BLOCKED="$allow_blocked" | ||
| juju status --format=json 2>/dev/null | python3 -c ' | ||
| import json, sys, os | ||
| try: | ||
| data = json.load(sys.stdin) | ||
| allowed = set(os.environ.get("ALLOW_BLOCKED", "").split(",")) - {""} | ||
| not_ready = 0 | ||
| total_units = 0 | ||
| for app_name, app in data.get("applications", {}).items(): | ||
| for unit in app.get("units", {}).values(): | ||
| total_units += 1 | ||
| ws = unit.get("workload-status", {}).get("current", "") | ||
| js = unit.get("juju-status", {}).get("current", "") | ||
| if ws == "active" and js == "idle": | ||
| continue | ||
| if ws == "blocked" and js == "idle" and app_name in allowed: | ||
| continue | ||
| not_ready += 1 | ||
| if total_units == 0: | ||
| print("provisioning") | ||
| else: | ||
| print(not_ready) | ||
| except Exception: | ||
| print("provisioning") | ||
| ' | ||
| ) || not_ready="provisioning" | ||
|
|
||
| if [[ "$not_ready" == "0" ]]; then | ||
| echo "All units active/idle after ${elapsed}s." | ||
| juju status | ||
| return 0 | ||
| elif [[ "$not_ready" == "provisioning" ]]; then | ||
| echo "[${elapsed}s elapsed] still provisioning – rechecking in ${interval}s…" | ||
| else | ||
| echo "[${elapsed}s elapsed] ${not_ready} unit(s) not yet active/idle – rechecking in ${interval}s…" | ||
| fi | ||
| sleep "$interval" | ||
| elapsed=$(( elapsed + interval )) | ||
| done | ||
|
|
||
| echo "Timed out after ${timeout}s. Final status:" | ||
| juju status | ||
| return 1 | ||
| } |
There was a problem hiding this comment.
I agree. Do we have examples of the instability? It's the same logic we're using in tests so I don't think it's unstable.
| [testenv:tutorial-extract] | ||
| description = Generate tutorial test scripts from Markdown sources | ||
| skip_install = True | ||
| commands = | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/environment.md tests/tutorial/01_environment.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/deploy.md tests/tutorial/02_deploy.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/integrate-with-client-applications.md tests/tutorial/03_client.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/manage-passwords.md tests/tutorial/04_passwords.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/enable-encryption.md tests/tutorial/05_encryption.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/use-kafka-connect.md tests/tutorial/06_kafka_connect.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/rebalance-partitions.md tests/tutorial/07_rebalance.sh | ||
| python3 tests/tutorial/extract_commands.py docs/tutorial/cleanup.md tests/tutorial/08_cleanup.sh | ||
|
|
There was a problem hiding this comment.
We can probably have a bash one-liner to list files and run them with xargs or something in sorted order
|
I had a productive discussion yesterday with @izmalk and it helped clarify some of the underlying decisions made in the PR. Posting my summary here for future reference:
|
imanenami
left a comment
There was a problem hiding this comment.
I can break this PR into three pieces:
- The testing syntax (we can call it the Frontend): This involves the
Annotation referencesection in theTESTING.mdfile. This is the part that I care most about at current stage, and I think needs to be set right from ground up, to be reusable across projects. This involves the HTML inline comments & the proposed syntax. - The test compiler (or we can call it the Backend): This involves the
extract_commands.py,helpers.sh, and themakeutility. I'm not going to be nitpicky at this stage, I believe we can improve this in the future once we agree on the testing syntax, and ideally imo, this should be in pure, type-hinted, quality-checked Python so that it can be maintainable by Python developers :) - The implementation of the syntax + compiler for Kafka tutorials: This is already good in my opinion 👍
With that mental model, the only major change I want to see in this PR before approving is the one that Vladimir already mentioned, and I reiterated in my previous comment, about the possibility of adding different flavors of test assertions using a syntax, which can for example be inspired from IPython magics.
Description
Created an automated solution for the tutorial end-to-end tests using Spread + Multipass.
extract_commands.pyscript parses the existing tutorial content and extracts necessary commands in correct order.tox -e tutorial) or via CI/CD.The tests are run in a Multipass VM.
Estimated time from start to finish ~1.5-2 hours.
The output finished like this:
https://warthogs.atlassian.net/browse/DPE-8588
I've added
TESTING.mdwith the description of the solution,agents.mdto improve AI reliability and updated the README.Checklist