From 45c322952a7c5ec40917c3d03f0fb22e777000ed Mon Sep 17 00:00:00 2001 From: Sam Crauwels Date: Thu, 23 Apr 2026 19:51:05 +0200 Subject: [PATCH 1/2] docs: refresh platforms table, cert var names, and last-month additions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pull docs back into sync with what the roles actually do after the last month of merges. Three classes of fix: Platform table in the introduction page still listed Debian 11 and RHEL 8 and was missing Ubuntu 26.04. That table is supposed to mirror roles/elasticsearch/meta/main.yml — now does. The external-cert example in the deployment how-to referenced variables that don't exist: kibana_tls_cert/key/ca, elasticsearch_*_tls_cert, the non-existent elasticsearch_http_tls_ca and elasticsearch_transport_tls_ca, and elasticsearch_tls_cacerts in a tip. Anyone copy-pasting the example was getting a silent no-op. Swapped to the real names (kibana_tls_certificate_file, elasticsearch_*_tls_certificate, elasticsearch_tls_ca_certificate). Added reference sections for the two variable families that landed recently without docs: the elasticsearch_config_restart_strategy family from the rolling-config-restart work, and elasticsearch_os_tuning from the sysctl/THP tuning work. Also rewrote the Handler guards section — it described a single handler with four guards, but the handler is now split into direct/rolling dispatch paths with five guard conditions each (the ansible_check_mode guard had been omitted too). --- docs/how-to/deployment.md | 17 +++++++------- docs/introduction/index.md | 6 ++--- docs/reference/elasticsearch.md | 41 ++++++++++++++++++++++++++++----- 3 files changed, 47 insertions(+), 17 deletions(-) diff --git a/docs/how-to/deployment.md b/docs/how-to/deployment.md index 2d21361..12467de 100644 --- a/docs/how-to/deployment.md +++ b/docs/how-to/deployment.md @@ -74,9 +74,9 @@ When you have certificates from an external CA (Let's Encrypt, corporate PKI, et kibana_tls: true kibana_cert_source: external -kibana_tls_cert: /etc/pki/kibana/kibana.crt -kibana_tls_key: /etc/pki/kibana/kibana.key -kibana_tls_ca: /etc/pki/kibana/ca-chain.crt +kibana_tls_certificate_file: /etc/pki/kibana/kibana.crt +kibana_tls_key_file: /etc/pki/kibana/kibana.key +kibana_tls_ca_file: /etc/pki/kibana/ca-chain.crt # Optional: key passphrase if the private key is encrypted # kibana_tls_key_passphrase: "{{ vault_kibana_key_pass }}" @@ -85,7 +85,7 @@ kibana_tls_ca: /etc/pki/kibana/ca-chain.crt The files must already exist on the Kibana host before running the playbook. The role configures Kibana to use them but does not manage the certificate lifecycle — renewal is your responsibility. !!! tip - If your external CA is not the same as the Elasticsearch CA, you also need to configure Elasticsearch to trust it. Add the CA certificate to `elasticsearch_tls_cacerts` on all ES nodes. + If your external CA is not the same as the Elasticsearch CA, you also need to configure Elasticsearch to trust it. Set `elasticsearch_tls_ca_certificate` on all ES nodes. ## Elasticsearch with external certificates @@ -95,14 +95,15 @@ For environments where certificates come from an external PKI: elasticsearch_cert_source: external # HTTP (client-facing) certificates -elasticsearch_http_tls_cert: /etc/pki/elasticsearch/http.crt +elasticsearch_http_tls_certificate: /etc/pki/elasticsearch/http.crt elasticsearch_http_tls_key: /etc/pki/elasticsearch/http.key -elasticsearch_http_tls_ca: /etc/pki/elasticsearch/ca-chain.crt # Transport (inter-node) certificates -elasticsearch_transport_tls_cert: /etc/pki/elasticsearch/transport.crt +elasticsearch_transport_tls_certificate: /etc/pki/elasticsearch/transport.crt elasticsearch_transport_tls_key: /etc/pki/elasticsearch/transport.key -elasticsearch_transport_tls_ca: /etc/pki/elasticsearch/ca-chain.crt + +# Shared CA for HTTP and transport +elasticsearch_tls_ca_certificate: /etc/pki/elasticsearch/ca-chain.crt ``` Each node needs its own certificate with the node's hostname or IP in the Subject Alternative Names (SAN). The transport certificate must include all node hostnames since nodes verify each other's identity during cluster formation. diff --git a/docs/introduction/index.md b/docs/introduction/index.md index e65d4af..bc32501 100644 --- a/docs/introduction/index.md +++ b/docs/introduction/index.md @@ -19,9 +19,9 @@ The collection provides six roles that cover each layer of the stack. They work | Category | Versions | |----------|----------| -| Debian | 11 (Bullseye), 12 (Bookworm), 13 (Trixie) | -| Ubuntu | 22.04 (Jammy), 24.04 (Noble) | -| Rocky Linux / RHEL | 8, 9, 10 | +| Debian | 12 (Bookworm), 13 (Trixie) | +| Ubuntu | 22.04 (Jammy), 24.04 (Noble), 26.04 (Resolute) | +| Rocky Linux / RHEL | 9, 10 | | Elastic Stack | 8.x, 9.x | | Ansible | 2.18+ | diff --git a/docs/reference/elasticsearch.md b/docs/reference/elasticsearch.md index 482c042..d24ca1f 100644 --- a/docs/reference/elasticsearch.md +++ b/docs/reference/elasticsearch.md @@ -415,6 +415,30 @@ elasticsearch_extra_config: Keys that conflict with settings managed by dedicated role variables (like `cluster.name`, `network.host`, security/TLS settings, `bootstrap.memory_lock`) are silently filtered out, and the role emits a warning telling you to use the dedicated variable instead. +### Config-triggered restarts + +When a run changes `elasticsearch.yml` or JVM options, the Restart Elasticsearch handler fires. On multi-node clusters the role restarts nodes one at a time and waits for cluster health to recover between nodes; on single-node clusters it restarts in place. + +```yaml +elasticsearch_config_restart_strategy: rolling +elasticsearch_config_restart_flush: true +elasticsearch_config_restart_wait_status: green +elasticsearch_config_restart_health_retries: 50 +elasticsearch_config_restart_health_delay: 30 +elasticsearch_config_restart_node_retries: 200 +elasticsearch_config_restart_node_delay: 3 +``` + +`elasticsearch_config_restart_strategy` picks between `rolling` (default — restart one node at a time, gate on cluster health) and `direct` (legacy all-at-once restart from a normal handler). Single-node clusters always take the direct path regardless of this setting. + +`elasticsearch_config_restart_flush` runs a synced flush before each node restart during a rolling restart. Set to `false` only if you have a specific reason to skip it. + +`elasticsearch_config_restart_wait_status` is the minimum cluster health colour the role waits for before and after each node restart. `green` is strictly safer; set to `yellow` if you have unassigned replicas that are expected and you don't want the restart to block on them. + +`elasticsearch_config_restart_health_retries` and `elasticsearch_config_restart_health_delay` control how long the role waits for the cluster to regain the chosen health status between nodes. Defaults give ~25 minutes per node (50 × 30s), which is generous for large clusters with lots of shard recovery. + +`elasticsearch_config_restart_node_retries` and `elasticsearch_config_restart_node_delay` control how long the role waits for the node it just restarted to rejoin the cluster. Defaults give ~10 minutes per node (200 × 3s). + ### Rolling Upgrades The role validates the upgrade path before any work begins. When `elasticstack_release` is 9 or higher and Elasticsearch is currently installed, the role checks that the installed version is at least 8.19.0. If it finds an older 8.x version, the play fails immediately -- you must step through 8.19.x first. This matches [Elastic's official upgrade requirements](https://www.elastic.co/docs/deploy-manage/upgrade/deployment-or-cluster). @@ -455,6 +479,10 @@ The default heap formula is `min(max(memtotal_mb / 1024 / 2, 1), 30)` -- half of The role sets `nofile=65535` for the `elasticsearch` user via PAM (`/etc/security/limits.d/`). This is required for production but was historically unreliable in the RPM post-install scripts. Controlled by `elasticsearch_pamlimits` (default `true`). +### OS-level tuning + +`elasticsearch_os_tuning` (default `true`) applies the sysctl and kernel settings Elasticsearch expects in production: raises `vm.max_map_count` for the mmapfs directory (required for large shard counts), drops `vm.swappiness` to 1, tightens TCP retry counts for faster fault detection, and disables Transparent Huge Pages at runtime. The tuning is skipped automatically in container environments (`virtualization_type` in `docker`, `container`, `containerd`, `lxc`, `podman`), where these sysctls typically can't be set and should be inherited from the host. Set to `false` if your host is managed by a separate tuning policy and you don't want the role writing `/etc/sysctl.d/`. + ### JNA tmpdir workaround On systems where `/tmp` is mounted with `noexec`, Java Native Access fails to load native libraries. Set `elasticsearch_jna_workaround: true` to redirect JNA's temp directory to `{{ elasticsearch_datapath }}/tmp` via the sysconfig file (`/etc/default/elasticsearch` on Debian, `/etc/sysconfig/elasticsearch` on RedHat). @@ -494,14 +522,15 @@ In container environments (`virtualization_type` in `container`, `docker`, `lxc` ### Handler guards -The "Restart Elasticsearch" handler has four guards that prevent it from firing when a restart would be redundant or harmful: +Notifications of `Restart Elasticsearch` are dispatched to one of two paths depending on `elasticsearch_config_restart_strategy` and cluster size: a direct restart-in-place (single-node or explicitly direct), or a rolling restart that run_once-orchestrates node-by-node restarts across the cluster (multi-node + rolling, the default). Every handler in that dispatch chain applies the same five guard conditions to prevent a restart that would be redundant or harmful: -1. `elasticsearch_enable` must be true -2. NOT during a fresh install (service already started naturally) -3. NOT during security initialization (service already started) -4. NOT after a rolling upgrade (upgrade did its own restart) +1. NOT in check mode (`ansible_check_mode` is false) +2. `elasticsearch_enable` must be true +3. NOT during a fresh install (service already started naturally) +4. NOT during security initialization (service already started) +5. NOT after a rolling upgrade (upgrade did its own restart) -The handler also triggers a Kibana restart on all Kibana hosts (if `elasticstack_full_stack` is enabled) since Kibana may need to reconnect after an ES restart. This Kibana restart is skipped during CA renewal. +A separate handler on the same notification triggers a Kibana restart on all Kibana hosts (if `elasticstack_full_stack` is enabled) since Kibana may need to reconnect after an ES restart. The Kibana restart is skipped during CA renewal. ### Double config write From 214ef1cba4a071d814f8a7df61d825d46d4d0d7f Mon Sep 17 00:00:00 2001 From: Sam Crauwels Date: Thu, 23 Apr 2026 20:05:33 +0200 Subject: [PATCH 2/2] docs: address CodeRabbit review on external-cert passphrase and restart skip MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two findings from the review on #145: The external-cert Kibana example in the deployment how-to still showed kibana_tls_key_passphrase, but that variable belongs to the role-generated (elasticstack_ca source) cert flow. The external-cert path uses kibana_tls_certificate_passphrase per the defaults annotation. Other occurrences of kibana_tls_key_passphrase in docs are all in role-managed-cert contexts where they're correct, so only this one line needed the swap. Handler-guards section only mentioned CA-renewal as the Kibana restart skip condition. The handler actually skips on both the renew_ca tag and elasticstack_ca_will_expire_soon — both are in roles/elasticsearch/handlers/main.yml. Updated to mention both and why (those paths coordinate their own Kibana restart). --- docs/how-to/deployment.md | 4 ++-- docs/reference/elasticsearch.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/how-to/deployment.md b/docs/how-to/deployment.md index 12467de..4ddb4e5 100644 --- a/docs/how-to/deployment.md +++ b/docs/how-to/deployment.md @@ -78,8 +78,8 @@ kibana_tls_certificate_file: /etc/pki/kibana/kibana.crt kibana_tls_key_file: /etc/pki/kibana/kibana.key kibana_tls_ca_file: /etc/pki/kibana/ca-chain.crt -# Optional: key passphrase if the private key is encrypted -# kibana_tls_key_passphrase: "{{ vault_kibana_key_pass }}" +# Optional: passphrase for an encrypted private key or P12 file +# kibana_tls_certificate_passphrase: "{{ vault_kibana_key_pass }}" ``` The files must already exist on the Kibana host before running the playbook. The role configures Kibana to use them but does not manage the certificate lifecycle — renewal is your responsibility. diff --git a/docs/reference/elasticsearch.md b/docs/reference/elasticsearch.md index d24ca1f..070020a 100644 --- a/docs/reference/elasticsearch.md +++ b/docs/reference/elasticsearch.md @@ -530,7 +530,7 @@ Notifications of `Restart Elasticsearch` are dispatched to one of two paths depe 4. NOT during security initialization (service already started) 5. NOT after a rolling upgrade (upgrade did its own restart) -A separate handler on the same notification triggers a Kibana restart on all Kibana hosts (if `elasticstack_full_stack` is enabled) since Kibana may need to reconnect after an ES restart. The Kibana restart is skipped during CA renewal. +A separate handler on the same notification triggers a Kibana restart on all Kibana hosts (if `elasticstack_full_stack` is enabled) since Kibana may need to reconnect after an ES restart. The Kibana restart is skipped when the `renew_ca` tag is active or when `elasticstack_ca_will_expire_soon` is true, since those paths have their own coordinated Kibana restart. ### Double config write