Skip to content

fix(tests): make integration tests idempotent against live models [LNDENG-4308]#119

Draft
jansdhillon wants to merge 3 commits intocanonical:mainfrom
jansdhillon:feat/LNDENG-4308-idempotent-integration-tests
Draft

fix(tests): make integration tests idempotent against live models [LNDENG-4308]#119
jansdhillon wants to merge 3 commits intocanonical:mainfrom
jansdhillon:feat/LNDENG-4308-idempotent-integration-tests

Conversation

@jansdhillon
Copy link
Copy Markdown
Contributor

Implements LNDENG-4308: Make landscape-server-operator integration tests fully idempotent against live models.

Required before integration-tests.yaml in canonical/landscape-saas (#141) can run safely against PS7.

Changes

juju.sshjuju.exec (JAAS incompatibility)

JAAS does not proxy SSH to units — juju.ssh simply doesn't work on PS7 live models. juju.exec runs the command directly over the Juju API and works correctly.

Affected:

  • helpers.pywait_for_service
  • test_landscape_schema_migrated — psql schema check
  • test_ubuntu_installer_attach_toggle_no_maintenance — check service inactive

test_modern_database_relation + test_legacy_db_relation — try/finally

restore_db_relations() was only called on the happy path. If the assertion failed the model was left with a broken DB relation. Wrapped in try/finally.

test_upgrade_action_updates_ppa — skip on live model

This test mutates PPA state (add-apt-repository) on the live unit and runs juju.ssh internally. Not safe against a live PS7 model. Skips when USE_HOST_JUJU_MODEL=True; belongs in a dedicated upgrade pipeline.

USE_HOST_JUJU_MODEL / USE_HOST_LBAAS_MODEL type fix

Changed from os.getenv(..., False) (returns str | None) to bool(os.getenv(...)) (returns bool). Behaviour is identical but the type annotation is now correct.

…DENG-4308]

- Replace all juju.ssh() calls with juju.exec() — JAAS does not proxy SSH
  to units, so juju.ssh is broken on live PS7 models. juju.exec runs the
  command directly over the Juju API.
  Affected: wait_for_service (helpers.py), test_landscape_schema_migrated,
  test_ubuntu_installer_attach_toggle_no_maintenance

- Wrap test_modern_database_relation and test_legacy_db_relation in
  try/finally so restore_db_relations() is guaranteed to run even if the
  assertion fails, leaving the model in a known-good state.

- Skip test_upgrade_action_updates_ppa when USE_HOST_JUJU_MODEL=True.
  This test mutates PPA state on the live unit and is unsuitable for live
  model runs; it belongs in a dedicated upgrade pipeline.

- Fix USE_HOST_JUJU_MODEL and USE_HOST_LBAAS_MODEL to use bool(os.getenv(...))
  instead of os.getenv(..., False) — the old form returned None or str,
  which made the type misleading even though truthiness was correct.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Landscape Server charm integration tests to be safe and repeatable when executed against live Juju models (notably JAAS/PS7), avoiding non-idempotent mutations and replacing SSH-dependent checks with API-driven execution.

Changes:

  • Replace juju.ssh usages with juju.exec for service/psql checks to work on JAAS-backed live models.
  • Make DB-relation mutation tests idempotent by ensuring relations are restored via try/finally.
  • Skip the PPA-mutating upgrade test when running against a host/live model; fix host-model env flags to be actual booleans.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/integration/test_bundle.py Switches unit command execution to juju.exec, adds try/finally restoration for DB relation tests, and skips a non-idempotent upgrade test on live models.
tests/integration/helpers.py Updates wait_for_service to use juju.exec instead of SSH for JAAS compatibility.
tests/integration/conftest.py Converts host-model env flags to bool for correct typing/usage in test logic.

elif "database" not in initial_relations:
juju.integrate("landscape-server:database", "postgresql:database")
juju.wait(lambda status: has_modern_pg(juju), timeout=120)
juju.integrate("landscape-server:database", "postgresql:database")
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After switching from the legacy db relation to database, this branch calls juju.integrate(...) but doesn’t wait for the modern relation to actually appear before reading juju.status() and asserting. This is inconsistent with the elif branch (which waits for has_modern_pg) and can lead to flaky failures if relation establishment is not immediate. Consider waiting for has_modern_pg(juju) (or jubilant.all_active) after the integrate in this branch as well.

Suggested change
juju.integrate("landscape-server:database", "postgresql:database")
juju.integrate("landscape-server:database", "postgresql:database")
juju.wait(lambda status: has_modern_pg(juju), timeout=120)

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +249
juju.remove_relation(
"landscape-server:database", "postgresql:database", force=True
)
juju.wait(lambda status: not has_modern_pg(juju), timeout=120)
juju.integrate("landscape-server:db", "postgresql:db-admin")

relations = set(juju.status().apps["landscape-server"].relations)
elif "db" not in initial_relations:
juju.integrate("landscape-server:db", "postgresql:db-admin")
juju.wait(lambda status: has_legacy_pg(juju), timeout=120)

assert "db" in relations
relations = set(juju.status().apps["landscape-server"].relations)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the branch that removes the modern database relation, the test immediately calls juju.integrate("landscape-server:db", ...) and then proceeds to read status/assert without waiting for has_legacy_pg(juju) (unlike the elif branch). This can be racey/flaky if the relation doesn’t show up in juju.status() immediately. Consider adding a wait for has_legacy_pg(juju) after the integrate in this branch too.

Copilot uses AI. Check for mistakes.
Comment thread tests/integration/test_bundle.py Outdated
Comment on lines 409 to 413
with pytest.raises(Exception):
juju.ssh(
name,
juju.exec(
f"systemctl is-active {LANDSCAPE_UBUNTU_INSTALLER_ATTACH}.service",
unit=name,
)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.raises(Exception) is very broad here and can mask unexpected failures (e.g., command transport errors) that aren’t actually “service is inactive”. Since wait_for_service() already treats jubilant.CLIError as the expected failure mode for non-zero exits, it would be safer to assert on jubilant.CLIError (and optionally check the output/return code if available) so the test only passes for the intended reason.

Copilot uses AI. Check for mistakes.
…4308]

- test_modern_database_relation: add juju.wait(has_modern_pg) after
  integrating database relation when switching from legacy, so status is
  polled until the relation actually appears before asserting
- test_legacy_db_relation: same — add juju.wait(has_legacy_pg) after
  integrating db relation when switching from modern
- test_ubuntu_installer_attach_toggle_no_maintenance: narrow
  pytest.raises(Exception) to pytest.raises(jubilant.CLIError) so only
  the expected non-zero exit failure is caught and transport errors are not
…ailover tests [LNDENG-4308]

- test_leader_services_not_on_non_leaders: assert LEADER_SERVICES are NOT
  active on non-leader units (explicit negative; safe on live models)
- test_service_ports_listening: verify all-unit ports and leader-only ports
  via ss; guards against HAProxy charm config regressions (safe on live models)
- test_leadership_change_service_follows: remove leader unit, verify LEADER_SERVICES
  migrate to new leader and DEFAULT_SERVICES stay up (skip on live model)
- test_scale_out_and_in: add/remove a unit, verify service placement throughout
  (skip on live model)
- test_postgresql_failover_services_recover: stop pg leader, verify landscape-server
  recovers automatically and appserver stays healthy (skip on live model)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants