mysql-k8s r404: database-relation-broken hook fails with KeyError 'cluster-name' on scale-down

**Note: This issue was generated with AI assistance (GitHub Copilot) based on automated log analysis and triage.**
Filed by @canonical/solutions-qa

---

## Summary

mysql-k8s charm revision 404 (channel 8.4/edge) fails scale-down operations due to an unhandled `KeyError: 'cluster-name'` in the `database-relation-broken` hook handler. This prevents units from being removed and causes integration tests to timeout.

## Root Cause

The `_on_database_broken()` handler in `/src/relations/mysql_provider.py` at line 272 attempts to access `self.app_peer_data["cluster-name"]` without checking if the key exists:

```python
# File: src/relations/mysql_provider.py, line 272
def _on_database_broken(self, event: RelationBrokenEvent):
    # ... code ...
    if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
       ^^^^^^^^^^^^^^^^^
       
# File: src/charm.py, line 204
@property
def _mysql(self) -> MySQL:
    return MySQL(
        self.app_peer_data["cluster-name"],  # ← KeyError when key doesn't exist
        # ... other params ...
    )
```

**Exception Traceback:**
```
File "/var/lib/juju/agents/unit-target-0/charm/src/relations/mysql_provider.py", line 272, in _on_database_broken
    if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
       ^^^^^^^^^^^^^^^^^
File "/var/lib/juju/agents/unit-target-0/charm/src/charm.py", line 204, in _mysql
    self.app_peer_data["cluster-name"],
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 'cluster-name'
```

## Impact

- **Scale-down operations fail** - Units cannot be removed
- **Unit enters error state** - `target/0* error idle hook failed: "database-relation-broken"`
- **Juju continuously retries** - Hook is retried 6+ times, all fail with same error
- **Integration tests timeout** - Tests wait 10 minutes for unit removal, then fail
- **Blocks deployments** - Prevents cleanup and redeployment of mysql-k8s charm

## Test Failure Details

- **Failed Test**: `test_scale_in_and_scale_out_charm`
- **Execution ID**: 443500
- **Test Result ID**: 10174540
- **Charm**: mysql-k8s
- **Revision**: 404
- **Channel**: 8.4/edge
- **Failure Rate**: 100% (consistent failure on this revision)
- **Error**: `JujuWaitTimeoutError: Timed out while waiting for unit removal (applications: ['target'], units: ['target/0'])`

## Evidence from Juju Debug Logs

**Hook Execution Failure (repeats 6+ times):**
```
2026-03-31T19:37:51.500Z [container-agent] 2026-03-31 19:37:51 ERROR juju-log database:5: root:Uncaught exception while in charm code:
2026-03-31T19:37:51.500Z [container-agent] Traceback (most recent call last):
2026-03-31T19:37:51.500Z [container-agent]   File "/var/lib/juju/agents/unit-target-0/charm/src/charm.py", line 1108, in <module>
2026-03-31T19:37:51.500Z [container-agent]     main(MySQLOperatorCharm)
2026-03-31T19:37:51.500Z [container-agent]   ...
2026-03-31T19:37:51.500Z [container-agent]   File "/var/lib/juju/agents/unit-target-0/charm/src/relations/mysql_provider.py", line 272, in _on_database_broken
2026-03-31T19:37:51.500Z [container-agent]     if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
2026-03-31T19:37:51.500Z [container-agent]        ^^^^^^^^^^^^^^^^^
2026-03-31T19:37:51.500Z [container-agent]   File "/var/lib/juju/agents/unit-target-0/charm/src/charm.py", line 204, in _mysql
2026-03-31T19:37:51.500Z [container-agent]     self.app_peer_data["cluster-name"],
2026-03-31T19:37:51.500Z [container-agent]     ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
2026-03-31T19:37:51.500Z [container-agent] KeyError: 'cluster-name'
2026-03-31T19:37:51.974Z [container-agent] 2026-03-31 19:37:51 ERROR juju.worker.uniter.operation runhook.go:180 hook "database-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1
```

**Hook Retry Timeline:**
```
19:37:51 - Attempt 1: KeyError 'cluster-name'
19:37:58 - Attempt 2: KeyError 'cluster-name'
19:38:11 - Attempt 3: KeyError 'cluster-name'
19:38:34 - Attempt 4: KeyError 'cluster-name'
19:39:18 - Attempt 5: KeyError 'cluster-name'
19:40:46 - Attempt 6: KeyError 'cluster-name'
... (continues)
19:48:02 - Unit status shows error state
19:47:31 - Test timeout
```

**Unit Status at Failure:**
```yaml
target/0:
  workload-status:
    current: error
    message: 'hook failed: "database-relation-broken" for neighbor:database'
    since: 31 Mar 2026 19:48:02Z
  juju-status:
    current: idle
    since: 31 Mar 2026 19:48:02Z
```

## Regression Analysis

- **Revision 404** (current): ✗ FAILS (100% failure rate on scale-down)
- **Previous revisions**: Likely PASS (needs verification)
- **Conclusion**: Bug introduced in revision 404

## Probable Cause

During scale-down of a mysql-k8s cluster:
1. The relation-broken hook is triggered when removing the relation to the remote application (e.g., wordpress-k8s)
2. The hook tries to access the `MySQL` object to check if the remote user exists
3. The `_mysql` property attempts to read `self.app_peer_data["cluster-name"]`
4. At this point in the scale-down lifecycle, the cluster-name key may not be available in peer data
5. An unhandled KeyError is raised, causing hook failure
6. Juju marks the unit as in error state and retries (up to 10 times)
7. Unit cannot transition to removed state, causing test timeout

## Recommended Fix

The charm should use defensive dictionary access in the `_mysql` property or the `_on_database_broken` hook:

**Option 1 (Recommended - in _mysql property):**
```python
@property
def _mysql(self) -> MySQL:
    cluster_name = self.app_peer_data.get("cluster-name")
    if not cluster_name:
        # Handle gracefully during scale-down when cluster-name may not be available
        raise RuntimeError("Cluster name not available - cluster may be scaling down")
    return MySQL(
        cluster_name,
        # ... other params ...
    )
```

**Option 2 (Guard in hook handler):**
```python
def _on_database_broken(self, event: RelationBrokenEvent):
    relation_id = event.relation.id
    # Check if we can access cluster before attempting to remove users
    if "cluster-name" not in self.app_peer_data:
        # Cluster name unavailable, likely during scale-down - skip user cleanup
        logger.warning("Skipping user removal during scale-down: cluster-name not available")
        return
    
    if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
        # ... rest of the handler
```

**Option 3 (Conditional property access):**
```python
def _on_database_broken(self, event: RelationBrokenEvent):
    relation_id = event.relation.id
    try:
        if self.charm._mysql.does_mysql_user_exist(self._get_username(relation_id), "%"):
            # ... remove the user
    except KeyError as e:
        # During scale-down, peer data may be unavailable
        logger.warning(f"Skipping user cleanup during relation break: {e}")
        return
```

The root issue is that the code assumes `cluster-name` key exists when it may not have been set or may be cleared during scale-down scenarios.

## Test Observer Link

View the failure with complete juju logs:
https://test-observer.canonical.com/#/charms/406079?testExecutionId=443500&testResultId=10174540

## Related Issues

This issue follows the same pattern as **Issue #202** (mysql-k8s r400: logging-relation-broken hook fails with KeyError 'logs_synced' on scale-down), which reports a similar unhandled KeyError in a different relation-broken hook. The fix pattern is identical - use defensive dictionary access instead of direct key access.

## Related Files

- Source: `src/relations/mysql_provider.py` (line 272)
- Source: `src/charm.py` (line 204)
- Test: `charm-integration-testing/test_scale_in_and_scale_out_charm`
- Charm: `canonical/mysql-operators` (mysql-k8s package)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql-k8s r404: database-relation-broken hook fails with KeyError 'cluster-name' on scale-down #205

Summary

Root Cause

Impact

Test Failure Details

Evidence from Juju Debug Logs

Regression Analysis

Probable Cause

Recommended Fix

Test Observer Link

Related Issues

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

mysql-k8s r404: database-relation-broken hook fails with KeyError 'cluster-name' on scale-down #205

Description

Summary

Root Cause

Impact

Test Failure Details

Evidence from Juju Debug Logs

Regression Analysis

Probable Cause

Recommended Fix

Test Observer Link

Related Issues

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions