Skip to content

Commit af27191

Browse files
committed
document locking/unlocking instances
1 parent 0fa67e5 commit af27191

File tree

4 files changed

+9
-5
lines changed

4 files changed

+9
-5
lines changed

.github/workflows/stackhpc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ jobs:
154154
run: |
155155
. venv/bin/activate
156156
. environments/.stackhpc/activate
157+
ansible-playbook --limit login,control ansible/adhoc/lock_unlock_instances.yml -e "appliances_server_action=unlock"
157158
cd "$STACKHPC_TF_DIR"
158159
tofu init
159160
tofu apply -auto-approve -var-file="${{ env.CI_CLOUD }}.tfvars"

docs/experimental/compute-init.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ login and control nodes. The process follows
2222
1. Compute nodes are reimaged:
2323

2424
```shell
25-
ansible-playbook -v --limit compute ansible/adhoc/rebuild.yml
25+
ansible-playbook -v ansible/adhoc/rebuild-via-slurm.yml
2626
```
2727

2828
2. Ansible-init runs against newly reimaged compute nodes

docs/experimental/slurm-controlled-rebuild.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,16 @@ In summary, the way this functionality works is as follows:
1212

1313
1. The image references(s) are manually updated in the OpenTofu configuration
1414
in the normal way.
15+
2. `lock_unlock_instances.yml --limit control,login -e "appliances_server_action=unlock"`
16+
is run to unlock the control and login nodes for reimaging.
1517
2. `tofu apply` is run which rebuilds the login and control nodes to the new
1618
image(s). The new image reference for compute nodes is ignored, but is
1719
written into the hosts inventory file (and is therefore available as an
1820
Ansible hostvar).
19-
3. The `site.yml` playbook is run which reconfigures the cluster as normal. At
20-
this point the cluster is functional, but using a new image for the login
21-
and control nodes and the old image for the compute nodes. This playbook
22-
also:
21+
3. The `site.yml` playbook is run which locks the instances again and reconfigures
22+
the cluster as normal. At this point the cluster is functional, but using a new
23+
image for the login and control nodes and the old image for the compute nodes.
24+
This playbook also:
2325
- Writes cluster configuration to the control node, using the
2426
[compute_init](../../ansible/roles/compute_init/README.md) role.
2527
- Configures an application credential and helper programs on the control

docs/sequence.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ sequenceDiagram
100100
participant cloud as Cloud
101101
participant nodes as Cluster Instances
102102
note over ansible: Update OpenTofu cluster_image variable [1]
103+
ansible->>cloud: Unlock control and and login nodes
103104
rect rgb(204, 232, 250)
104105
note over ansible: $ tofu apply ....
105106
ansible<<->>cloud: Check login/compute current vs desired images

0 commit comments

Comments
 (0)