Note: This issue was generated with AI assistance (GitHub Copilot) based on automated log analysis and triage.
Filed by @canonical/solutions-qa
Summary
mysql-k8s charm revision 404 (channel 8.4/edge) fails scale-down operations due to an unhandled KeyError: 'cluster-name' in the database-relation-broken hook handler. This prevents units from being removed and causes integration tests to timeout.
Root Cause
The _on_database_broken() handler in /src/relations/mysql_provider.py at line 272 attempts to access self.app_peer_data["cluster-name"] without checking if the key exists:
# File: src/relations/mysql_provider.py, line 272
def _on_database_broken(self, event: RelationBrokenEvent):
# ... code ...
if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
^^^^^^^^^^^^^^^^^
# File: src/charm.py, line 204
@property
def _mysql(self) -> MySQL:
return MySQL(
self.app_peer_data["cluster-name"], # ← KeyError when key doesn't exist
# ... other params ...
)
Exception Traceback:
File "/var/lib/juju/agents/unit-target-0/charm/src/relations/mysql_provider.py", line 272, in _on_database_broken
if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
^^^^^^^^^^^^^^^^^
File "/var/lib/juju/agents/unit-target-0/charm/src/charm.py", line 204, in _mysql
self.app_peer_data["cluster-name"],
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'cluster-name'
Impact
- Scale-down operations fail - Units cannot be removed
- Unit enters error state -
target/0* error idle hook failed: "database-relation-broken"
- Juju continuously retries - Hook is retried 6+ times, all fail with same error
- Integration tests timeout - Tests wait 10 minutes for unit removal, then fail
- Blocks deployments - Prevents cleanup and redeployment of mysql-k8s charm
Test Failure Details
- Failed Test:
test_scale_in_and_scale_out_charm
- Execution ID: 443500
- Test Result ID: 10174540
- Charm: mysql-k8s
- Revision: 404
- Channel: 8.4/edge
- Failure Rate: 100% (consistent failure on this revision)
- Error:
JujuWaitTimeoutError: Timed out while waiting for unit removal (applications: ['target'], units: ['target/0'])
Evidence from Juju Debug Logs
Hook Execution Failure (repeats 6+ times):
2026-03-31T19:37:51.500Z [container-agent] 2026-03-31 19:37:51 ERROR juju-log database:5: root:Uncaught exception while in charm code:
2026-03-31T19:37:51.500Z [container-agent] Traceback (most recent call last):
2026-03-31T19:37:51.500Z [container-agent] File "/var/lib/juju/agents/unit-target-0/charm/src/charm.py", line 1108, in <module>
2026-03-31T19:37:51.500Z [container-agent] main(MySQLOperatorCharm)
2026-03-31T19:37:51.500Z [container-agent] ...
2026-03-31T19:37:51.500Z [container-agent] File "/var/lib/juju/agents/unit-target-0/charm/src/relations/mysql_provider.py", line 272, in _on_database_broken
2026-03-31T19:37:51.500Z [container-agent] if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
2026-03-31T19:37:51.500Z [container-agent] ^^^^^^^^^^^^^^^^^
2026-03-31T19:37:51.500Z [container-agent] File "/var/lib/juju/agents/unit-target-0/charm/src/charm.py", line 204, in _mysql
2026-03-31T19:37:51.500Z [container-agent] self.app_peer_data["cluster-name"],
2026-03-31T19:37:51.500Z [container-agent] ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
2026-03-31T19:37:51.500Z [container-agent] KeyError: 'cluster-name'
2026-03-31T19:37:51.974Z [container-agent] 2026-03-31 19:37:51 ERROR juju.worker.uniter.operation runhook.go:180 hook "database-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1
Hook Retry Timeline:
19:37:51 - Attempt 1: KeyError 'cluster-name'
19:37:58 - Attempt 2: KeyError 'cluster-name'
19:38:11 - Attempt 3: KeyError 'cluster-name'
19:38:34 - Attempt 4: KeyError 'cluster-name'
19:39:18 - Attempt 5: KeyError 'cluster-name'
19:40:46 - Attempt 6: KeyError 'cluster-name'
... (continues)
19:48:02 - Unit status shows error state
19:47:31 - Test timeout
Unit Status at Failure:
target/0:
workload-status:
current: error
message: 'hook failed: "database-relation-broken" for neighbor:database'
since: 31 Mar 2026 19:48:02Z
juju-status:
current: idle
since: 31 Mar 2026 19:48:02Z
Regression Analysis
- Revision 404 (current): ✗ FAILS (100% failure rate on scale-down)
- Previous revisions: Likely PASS (needs verification)
- Conclusion: Bug introduced in revision 404
Probable Cause
During scale-down of a mysql-k8s cluster:
- The relation-broken hook is triggered when removing the relation to the remote application (e.g., wordpress-k8s)
- The hook tries to access the
MySQL object to check if the remote user exists
- The
_mysql property attempts to read self.app_peer_data["cluster-name"]
- At this point in the scale-down lifecycle, the cluster-name key may not be available in peer data
- An unhandled KeyError is raised, causing hook failure
- Juju marks the unit as in error state and retries (up to 10 times)
- Unit cannot transition to removed state, causing test timeout
Recommended Fix
The charm should use defensive dictionary access in the _mysql property or the _on_database_broken hook:
Option 1 (Recommended - in _mysql property):
@property
def _mysql(self) -> MySQL:
cluster_name = self.app_peer_data.get("cluster-name")
if not cluster_name:
# Handle gracefully during scale-down when cluster-name may not be available
raise RuntimeError("Cluster name not available - cluster may be scaling down")
return MySQL(
cluster_name,
# ... other params ...
)
Option 2 (Guard in hook handler):
def _on_database_broken(self, event: RelationBrokenEvent):
relation_id = event.relation.id
# Check if we can access cluster before attempting to remove users
if "cluster-name" not in self.app_peer_data:
# Cluster name unavailable, likely during scale-down - skip user cleanup
logger.warning("Skipping user removal during scale-down: cluster-name not available")
return
if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
# ... rest of the handler
Option 3 (Conditional property access):
def _on_database_broken(self, event: RelationBrokenEvent):
relation_id = event.relation.id
try:
if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
# ... remove the user
except KeyError as e:
# During scale-down, peer data may be unavailable
logger.warning(f"Skipping user cleanup during relation break: {e}")
return
The root issue is that the code assumes cluster-name key exists when it may not have been set or may be cleared during scale-down scenarios.
Test Observer Link
View the failure with complete juju logs:
https://test-observer.canonical.com/#/charms/406079?testExecutionId=443500&testResultId=10174540
Related Issues
This issue follows the same pattern as Issue #202 (mysql-k8s r400: logging-relation-broken hook fails with KeyError 'logs_synced' on scale-down), which reports a similar unhandled KeyError in a different relation-broken hook. The fix pattern is identical - use defensive dictionary access instead of direct key access.
Related Files
- Source:
src/relations/mysql_provider.py (line 272)
- Source:
src/charm.py (line 204)
- Test:
charm-integration-testing/test_scale_in_and_scale_out_charm
- Charm:
canonical/mysql-operators (mysql-k8s package)
Note: This issue was generated with AI assistance (GitHub Copilot) based on automated log analysis and triage.
Filed by @canonical/solutions-qa
Summary
mysql-k8s charm revision 404 (channel 8.4/edge) fails scale-down operations due to an unhandled
KeyError: 'cluster-name'in thedatabase-relation-brokenhook handler. This prevents units from being removed and causes integration tests to timeout.Root Cause
The
_on_database_broken()handler in/src/relations/mysql_provider.pyat line 272 attempts to accessself.app_peer_data["cluster-name"]without checking if the key exists:Exception Traceback:
Impact
target/0* error idle hook failed: "database-relation-broken"Test Failure Details
test_scale_in_and_scale_out_charmJujuWaitTimeoutError: Timed out while waiting for unit removal (applications: ['target'], units: ['target/0'])Evidence from Juju Debug Logs
Hook Execution Failure (repeats 6+ times):
Hook Retry Timeline:
Unit Status at Failure:
Regression Analysis
Probable Cause
During scale-down of a mysql-k8s cluster:
MySQLobject to check if the remote user exists_mysqlproperty attempts to readself.app_peer_data["cluster-name"]Recommended Fix
The charm should use defensive dictionary access in the
_mysqlproperty or the_on_database_brokenhook:Option 1 (Recommended - in _mysql property):
Option 2 (Guard in hook handler):
Option 3 (Conditional property access):
The root issue is that the code assumes
cluster-namekey exists when it may not have been set or may be cleared during scale-down scenarios.Test Observer Link
View the failure with complete juju logs:
https://test-observer.canonical.com/#/charms/406079?testExecutionId=443500&testResultId=10174540
Related Issues
This issue follows the same pattern as Issue #202 (mysql-k8s r400: logging-relation-broken hook fails with KeyError 'logs_synced' on scale-down), which reports a similar unhandled KeyError in a different relation-broken hook. The fix pattern is identical - use defensive dictionary access instead of direct key access.
Related Files
src/relations/mysql_provider.py(line 272)src/charm.py(line 204)charm-integration-testing/test_scale_in_and_scale_out_charmcanonical/mysql-operators(mysql-k8s package)