document locking/unlocking instances

bertiethorpe · bertiethorpe · commit af27191a3556 · 2025-11-12T18:33:34.000Z
diff --git a/.github/workflows/stackhpc.yml b/.github/workflows/stackhpc.yml
@@ -154,6 +154,7 @@ jobs:
         run: |
           . venv/bin/activate
           . environments/.stackhpc/activate
+          ansible-playbook --limit login,control ansible/adhoc/lock_unlock_instances.yml -e "appliances_server_action=unlock"
           cd "$STACKHPC_TF_DIR"
           tofu init
           tofu apply -auto-approve -var-file="${{ env.CI_CLOUD }}.tfvars"
diff --git a/docs/experimental/compute-init.md b/docs/experimental/compute-init.md
@@ -22,7 +22,7 @@ login and control nodes. The process follows
 1. Compute nodes are reimaged:
 
 ```shell
-ansible-playbook -v --limit compute ansible/adhoc/rebuild.yml
+ansible-playbook -v ansible/adhoc/rebuild-via-slurm.yml
 ```
 
 2. Ansible-init runs against newly reimaged compute nodes
diff --git a/docs/experimental/slurm-controlled-rebuild.md b/docs/experimental/slurm-controlled-rebuild.md
@@ -12,14 +12,16 @@ In summary, the way this functionality works is as follows:
 
 1. The image references(s) are manually updated in the OpenTofu configuration
    in the normal way.
+2. `lock_unlock_instances.yml --limit control,login -e "appliances_server_action=unlock"`
+   is run to unlock the control and login nodes for reimaging.
 2. `tofu apply` is run which rebuilds the login and control nodes to the new
    image(s). The new image reference for compute nodes is ignored, but is
    written into the hosts inventory file (and is therefore available as an
    Ansible hostvar).
-3. The `site.yml` playbook is run which reconfigures the cluster as normal. At
-   this point the cluster is functional, but using a new image for the login
-   and control nodes and the old image for the compute nodes. This playbook
-   also:
+3. The `site.yml` playbook is run which locks the instances again and reconfigures
+   the cluster as normal. At this point the cluster is functional, but using a new
+   image for the login and control nodes and the old image for the compute nodes.
+   This playbook also:
    - Writes cluster configuration to the control node, using the
      [compute_init](../../ansible/roles/compute_init/README.md) role.
    - Configures an application credential and helper programs on the control
diff --git a/docs/sequence.md b/docs/sequence.md
@@ -100,6 +100,7 @@ sequenceDiagram
     participant cloud as Cloud
     participant nodes as Cluster Instances
     note over ansible: Update OpenTofu cluster_image variable [1]
+    ansible->>cloud: Unlock control and and login nodes
     rect rgb(204, 232, 250)
     note over ansible: $ tofu apply ....
     ansible<<->>cloud: Check login/compute current vs desired images