Kibana health check should fail fast on permanent errors

## Problem

The Kibana readiness health check in `roles/kibana/tasks/main.yml` and `roles/kibana/tasks/restart_and_verify_kibana.yml` retries blindly for 5 minutes (60 × 5s) regardless of the failure reason. This causes unnecessarily long deploy times when the issue is a permanent error that will never self-resolve.

## Current behavior

```yaml
- name: Wait for Kibana readiness
  shell: |
    HTTP_CODE=$(curl -sk ... https://localhost:5601/api/status) || true
    if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "401" ]; then exit 0; fi
    exit 1
  retries: 60
  delay: 5
```

This treats all non-200/401 responses the same — retry and hope. A Kibana that can't start due to a misconfiguration waits the full 5 minutes before failing.

## Proposed behavior

The health check should distinguish between transient and permanent errors:

| Scenario | Response | Action |
|----------|----------|--------|
| Kibana starting up | Connection refused / no response | Retry (transient) |
| Kibana ready | 200 | Success |
| Kibana ready but auth required | 401 | Success |
| ES backend unreachable | Kibana log shows connection errors | **Fail immediately** with "Elasticsearch not reachable" |
| Wrong kibana_system password | Kibana log shows auth failures | **Fail immediately** with "check kibana_system password" |
| Kibana crashed | systemctl shows inactive/failed | **Fail immediately** with journal output |
| Startup error | Kibana log shows FATAL | **Fail immediately** with the error |

## Implementation suggestion

```bash
# Check if service is dead (permanent)
if ! systemctl is-active --quiet kibana; then
  journalctl -u kibana --no-pager -n 20 >&2
  exit 2  # permanent failure
fi

# Check for permanent errors in recent logs
if journalctl -u kibana --no-pager -n 50 | grep -q "FATAL\|Unable to connect to Elasticsearch"; then
  journalctl -u kibana --no-pager -n 20 >&2
  exit 2
fi

# Transient: still starting up
HTTP_CODE=$(curl -sk -o /dev/null -w '%{http_code}' ...)
if [ "$HTTP_CODE" = "200" ] || [ "$HTTP_CODE" = "401" ]; then exit 0; fi
exit 1  # retry
```

With `failed_when: result.rc == 2` for immediate failure, and `until: result.rc == 0` for retries.

## Impact

- Deploys fail in seconds instead of 5 minutes on configuration errors
- Diagnostic output is shown immediately instead of after a long timeout
- Transient startup delays still retry normally

## Related

- #119 — Kibana health check uses HTTP when kibana_tls is enabled (fixed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana health check should fail fast on permanent errors #127

Problem

Current behavior

Proposed behavior

Implementation suggestion

Impact

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scenario	Response	Action
Kibana starting up	Connection refused / no response	Retry (transient)
Kibana ready	200	Success
Kibana ready but auth required	401	Success
ES backend unreachable	Kibana log shows connection errors	Fail immediately with "Elasticsearch not reachable"
Wrong kibana_system password	Kibana log shows auth failures	Fail immediately with "check kibana_system password"
Kibana crashed	systemctl shows inactive/failed	Fail immediately with journal output
Startup error	Kibana log shows FATAL	Fail immediately with the error

Kibana health check should fail fast on permanent errors #127

Description

Problem

Current behavior

Proposed behavior

Implementation suggestion

Impact

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions