Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
ba96992
Delete environments/.caas/ansible.cfg (#766)
bertiethorpe Aug 28, 2025
73f614a
Add filesystems docs (#710)
MoteHue Aug 29, 2025
cb4ca3c
CaaS pre-hook fix for galaxy requirements validation (#767)
bertiethorpe Aug 29, 2025
21ef880
Production end to end deployment docs (#678)
MoteHue Aug 29, 2025
cbf990a
Fix inventory parsing of cookiecutter env (#768)
MoteHue Sep 4, 2025
275da83
Refactor Pulp repo definitions and add more Pulp documentation (#760)
wtripp180901 Sep 4, 2025
2984292
temp fix: add alertmanager passwd to persist_openhpc_secrets template
bertiethorpe Sep 5, 2025
a3be3c9
missing ','
bertiethorpe Sep 5, 2025
32e9838
alertmanager admin passwd group_var
bertiethorpe Sep 5, 2025
6e05021
fix incorrect use of partition in nodegroup variable definitions (#771)
sjpb Sep 5, 2025
109f584
make caas persist secrets idempotent
bertiethorpe Sep 8, 2025
7911ee0
Merge branch 'main' into fix/caas-secrets
bertiethorpe Sep 8, 2025
60d531d
Bump Pulp snapshots for RL 9.6 (#772)
priteau Sep 9, 2025
c94d134
add support for setting server group (#773)
sjpb Sep 9, 2025
82897d4
Bump CUDA to 13.0.1 and NVIDIA driver to 580.82.07
priteau Sep 9, 2025
b2ed670
Merge pull request #776 from stackhpc/bump-cuda
priteau Sep 9, 2025
bc0c66c
Merge pull request #774 from stackhpc/fix/caas-secrets
bertiethorpe Sep 9, 2025
b42c2f8
Add validation for tofu-templated vars (#775)
sjpb Sep 11, 2025
919a7e2
Fix error message for state volume provisioning (#780)
sjpb Sep 11, 2025
c12ec99
Enable linting (#732)
maxstack Sep 18, 2025
fad0ff4
Define login subgroups in Ansible inventory (#727)
priteau Sep 18, 2025
eb1fb2d
Fix label in Jupyter Notebook form (#787)
priteau Sep 18, 2025
0da4041
Ignore changes to port binding and dhcp options (#778)
sjpb Sep 18, 2025
06857df
Expose vgpu group in site inventory (#786)
priteau Sep 18, 2025
535528f
Add documentation for OpenTofu remote state (#784)
sjpb Sep 19, 2025
5bedf73
Remove unused cloudalchemy alertmanager role (is in-repo role instead…
sjpb Sep 19, 2025
3b4be85
Fix various typos
priteau Sep 23, 2025
a4ea997
Merge pull request #796 from stackhpc/typo-fixes
priteau Sep 23, 2025
67b2658
Update dnf repo snapshots (+ source repos, removes RL8 Lustre build C…
sjpb Sep 24, 2025
dbf1422
Validate nodegroup names (#793)
sjpb Sep 24, 2025
4548b9b
Bump Open OnDemand to v4 & install apps in fatimage (#782)
bertiethorpe Sep 25, 2025
ab4a5ae
Support software raid root disks in stackhpc images (#785)
sjpb Sep 25, 2025
00c044f
Fix .caas secrets not persisting post-reimage + skip failing validati…
wtripp180901 Sep 29, 2025
9fb0bf8
Pin bcrypt to 4.3.0 to avoid passlib bug (#801)
wtripp180901 Sep 30, 2025
b7161de
Merge branch 'main' into fix/caas-upgrade
wtripp180901 Sep 30, 2025
d8b3cf6
changed variable name
wtripp180901 Sep 30, 2025
4bcc614
Merge pull request #798 from stackhpc/fix/caas-upgrade
wtripp180901 Sep 30, 2025
7e88f5a
move image download/conversion to runner's /mnt
sjpb Sep 30, 2025
4c36537
make image dir
sjpb Sep 30, 2025
63d918b
fix image upload
sjpb Oct 1, 2025
3f4ed03
Merge pull request #805 from stackhpc/fix/image-upload
bertiethorpe Oct 1, 2025
f351b9d
bump codeserver app version (#806)
bertiethorpe Oct 1, 2025
69f71fe
Use (group) syntax in access.conf (#804)
priteau Oct 1, 2025
81a2581
Remove extra lines in activate scripts (#803)
priteau Oct 1, 2025
82c814e
bump new fatimages (#808)
bertiethorpe Oct 2, 2025
b11696e
Improve build group definitions (#788)
sjpb Oct 3, 2025
b504f10
Expose FIPs in inventory hosts file (#807)
claudia-lola Oct 3, 2025
72aff75
Allow VS Code Remote SSH while blocking NFS mounts (#799)
priteau Oct 4, 2025
d9c5d8f
Delete build VMs in CI nightly cleanup (#777)
sjpb Oct 7, 2025
c40a383
Export state directory to OnDemand nodes in CaaS environment (#809)
wtripp180901 Oct 9, 2025
606219c
Merge tag 'v2.7' into update-to-2.7
technowhizz Nov 6, 2025
bc4e6ee
Bump image for v2.7
technowhizz Nov 7, 2025
d954cbc
Don't use login as name for login node object
technowhizz Nov 10, 2025
cd96d9d
Fix structure of ansible-ssh
technowhizz Nov 10, 2025
5812791
Add script to do all env related config at once
technowhizz Nov 10, 2025
06dede4
Change image to use id and volume size for v2.7 image build (packer)
technowhizz Nov 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 24 additions & 0 deletions .ansible-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
skip_list:
- role-name
# Unresolved issues with parsing jinja in multiline strings
# https://github.com/ansible/ansible-lint/issues/3935
- jinja[spacing]
- galaxy[no-changelog]
- meta-runtime[unsupported-version]

warn_list:
- name[missing]
- name[play]
- var-naming

exclude_paths:
- actionlint.yml
- .ansible/
- .github/
# Rule 'syntax-check' is unskippable, you cannot use it in 'skip_list' or 'warn_list'.
# It breaks the parser which takes place before the linter, the only option is to exclude the file.
- ansible/roles/filebeat/tasks/runtime.yml
- environments/common/files/filebeat/filebeat.yml
# Rule 'load-failure[filenotfounderror]' is also unskippable
- ansible/roles/compute_init/files/compute-init.yml
4 changes: 4 additions & 0 deletions .checkov.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
skip-check:
# Requires all blocks to have rescue: - not considered appropriate
- CKV2_ANSIBLE_3
8 changes: 8 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# The is primarily used to alter the behaviour of linters executed by super-linter.
# See https://editorconfig.org/

# shfmt will default to indenting shell scripts with tabs,
# define the indent as 2 spaces
[{.github/bin,dev}/*.sh]
indent_style = space
indent_size = 2
10 changes: 5 additions & 5 deletions .github/bin/create-merge-branch.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ if git show-branch "remotes/origin/$BRANCH_NAME" >/dev/null 2>&1; then
fi

echo "[INFO] Merging release tag - $RELEASE_TAG"
git merge --strategy recursive -X theirs --no-commit $RELEASE_TAG
git merge --strategy recursive -X theirs --no-commit "$RELEASE_TAG"

# Check if the merge resulted in any changes being staged
if [ -n "$(git status --short)" ]; then
Expand All @@ -54,7 +54,7 @@ if [ -n "$(git status --short)" ]; then
# NOTE(scott): The GitHub create-pull-request action does
# the commiting for us, so we only need to make branches
# and commits if running outside of GitHub actions.
if [ ! $GITHUB_ACTIONS ]; then
if [ ! "$GITHUB_ACTIONS" ]; then
echo "[INFO] Checking out temporary branch '$BRANCH_NAME'..."
git checkout -b "$BRANCH_NAME"

Expand All @@ -74,8 +74,8 @@ if [ -n "$(git status --short)" ]; then

# Write a file containing the branch name and tag
# for automatic PR or MR creation that follows
echo "BRANCH_NAME=\"$BRANCH_NAME\"" > .mergeenv
echo "RELEASE_TAG=\"$RELEASE_TAG\"" >> .mergeenv
echo "BRANCH_NAME=\"$BRANCH_NAME\"" >.mergeenv
echo "RELEASE_TAG=\"$RELEASE_TAG\"" >>.mergeenv
else
echo "[INFO] Merge resulted in no changes"
fi
fi
14 changes: 7 additions & 7 deletions .github/bin/get-s3-image.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ echo "Checking if image $image_name exists in OpenStack"
image_exists=$(openstack image list --name "$image_name" -f value -c Name)

if [ -n "$image_exists" ]; then
echo "Image $image_name already exists in OpenStack."
echo "Image $image_name already exists in OpenStack."
else
echo "Image $image_name not found in OpenStack. Getting it from S3."
echo "Image $image_name not found in OpenStack. Getting it from S3."

wget https://leafcloud.store/swift/v1/AUTH_f39848421b2747148400ad8eeae8d536/$bucket_name/$image_name --progress=dot:giga
wget "https://leafcloud.store/swift/v1/AUTH_f39848421b2747148400ad8eeae8d536/$bucket_name/$image_name" --progress=dot:giga

echo "Uploading image $image_name to OpenStack..."
openstack image create --file $image_name --disk-format qcow2 $image_name --progress
echo "Uploading image $image_name to OpenStack..."
openstack image create --file "$image_name" --disk-format qcow2 "$image_name" --progress

echo "Image $image_name has been uploaded to OpenStack."
fi
echo "Image $image_name has been uploaded to OpenStack."
fi
1 change: 1 addition & 0 deletions .github/linters/.checkov.yaml
1 change: 1 addition & 0 deletions .github/linters/.python-lint
1 change: 1 addition & 0 deletions .github/linters/.shellcheckrc
1 change: 1 addition & 0 deletions .github/linters/.yamllint.yml
1 change: 1 addition & 0 deletions .github/linters/actionlint.yml
49 changes: 21 additions & 28 deletions .github/workflows/extra.yml
Original file line number Diff line number Diff line change
@@ -1,38 +1,31 @@
---

# Test building extra images on OpenStack.
# This workflow can run standalone or as part of the main CI workflow.
# See the workflow file 'main.yml' for how this is CI triggered.

name: Test extra build
on:
workflow_call:
workflow_dispatch:
push:
branches:
- main
paths:
- 'environments/.stackhpc/tofu/cluster_image.auto.tfvars.json'
- 'ansible/roles/doca/**'
- 'ansible/roles/cuda/**'
- 'ansible/roles/slurm_recompile/**' # runs on cuda group
- 'ansible/roles/lustre/**'
- '.github/workflows/extra.yml'
pull_request:
paths:
- 'environments/.stackhpc/tofu/cluster_image.auto.tfvars.json'
- 'ansible/roles/doca/**'
- 'ansible/roles/cuda/**'
- 'ansible/roles/lustre/**'
- '.github/workflows/extra.yml'

permissions:
contents: read
packages: write
# To report GitHub Actions status checks
statuses: write

jobs:
doca:
name: extra-build
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.build.image_name }} # to branch/PR + OS
cancel-in-progress: true
runs-on: ubuntu-22.04
strategy:
fail-fast: false # allow other matrix jobs to continue even if one fails
matrix: # build RL8, RL9
build:
- image_name: openhpc-extra-RL8
source_image_name_key: RL8 # key into environments/.stackhpc/tofu/cluster_image.auto.tfvars.json
inventory_groups: doca,cuda,lustre
inventory_groups: doca,cuda # lustre disabled due to https://github.com/stackhpc/ansible-slurm-appliance/pull/759
volume_size: 35 # needed for cuda
- image_name: openhpc-extra-RL9
source_image_name_key: RL9
Expand All @@ -46,7 +39,7 @@ jobs:
PACKER_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Load current fat images into GITHUB_ENV
# see https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#example-of-a-multiline-string
Expand All @@ -60,7 +53,7 @@ jobs:
- name: Record settings
run: |
echo CI_CLOUD: ${{ env.CI_CLOUD }}
echo FAT_IMAGES: ${FAT_IMAGES}
echo "FAT_IMAGES: ${FAT_IMAGES}"

- name: Setup ssh
run: |
Expand Down Expand Up @@ -99,7 +92,7 @@ jobs:

PACKER_LOG=1 packer build \
-on-error=${{ vars.PACKER_ON_ERROR }} \
-var-file=$PKR_VAR_environment_root/${{ env.CI_CLOUD }}.pkrvars.hcl \
-var-file="$PKR_VAR_environment_root/${{ env.CI_CLOUD }}.pkrvars.hcl" \
-var "source_image_name=${{ fromJSON(env.FAT_IMAGES)['cluster_image'][matrix.build.source_image_name_key] }}" \
-var "image_name=${{ matrix.build.image_name }}" \
-var "inventory_groups=${{ matrix.build.inventory_groups }}" \
Expand All @@ -111,14 +104,14 @@ jobs:
run: |
. venv/bin/activate
IMAGE_ID=$(jq --raw-output '.builds[-1].artifact_id' packer/packer-manifest.json)
while ! openstack image show -f value -c name $IMAGE_ID; do
while ! openstack image show -f value -c name "$IMAGE_ID"; do
sleep 5
done
IMAGE_NAME=$(openstack image show -f value -c name $IMAGE_ID)
IMAGE_NAME=$(openstack image show -f value -c name "$IMAGE_ID")
echo "image-name=${IMAGE_NAME}" >> "$GITHUB_OUTPUT"
echo "image-id=$IMAGE_ID" >> "$GITHUB_OUTPUT"
echo $IMAGE_ID > image-id.txt
echo $IMAGE_NAME > image-name.txt
echo "$IMAGE_ID" > image-id.txt
echo "$IMAGE_NAME" > image-name.txt

- name: Make image usable for further builds
run: |
Expand Down
26 changes: 17 additions & 9 deletions .github/workflows/fatimage.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: Build fat image
on:
workflow_dispatch:
# checkov:skip=CKV_GHA_7: "The build output cannot be affected by user parameters other than the build entry point and the top-level source location. GitHub Actions workflow_dispatch inputs MUST be empty. "
inputs:
ci_cloud:
description: 'Select the CI_CLOUD'
Expand All @@ -16,6 +17,12 @@ on:
required: true
default: true

permissions:
contents: read
packages: write
# To report GitHub Actions status checks
statuses: write

jobs:
openstack:
name: openstack-imagebuild
Expand All @@ -29,10 +36,10 @@ jobs:
build:
- image_name: openhpc-RL8
source_image_name: Rocky-8-GenericCloud-Base-8.10-20240528.0.x86_64.raw
inventory_groups: control,compute,login,update
inventory_groups: fatimage
- image_name: openhpc-RL9
source_image_name: Rocky-9-GenericCloud-Base-9.6-20250531.0.x86_64.qcow2
inventory_groups: control,compute,login,update
inventory_groups: fatimage
env:
ANSIBLE_FORCE_COLOR: True
OS_CLOUD: openstack
Expand All @@ -42,11 +49,12 @@ jobs:
PACKER_GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Record settings for CI cloud
run: |
echo CI_CLOUD: ${{ env.CI_CLOUD }}
echo cleanup_on_failure: ${{ github.event.inputs.cleanup_on_failure }}

- name: Setup ssh
run: |
Expand Down Expand Up @@ -84,8 +92,8 @@ jobs:
packer init .

PACKER_LOG=1 packer build \
-on-error=${{ github.event.inputs.cleanup_on_failure && 'cleanup' || 'abort' }} \
-var-file=$PKR_VAR_environment_root/${{ env.CI_CLOUD }}.pkrvars.hcl \
-on-error=${{ github.event.inputs.cleanup_on_failure == 'true' && 'cleanup' || 'abort' }} \
-var-file="$PKR_VAR_environment_root/${{ env.CI_CLOUD }}.pkrvars.hcl" \
-var "source_image_name=${{ matrix.build.source_image_name }}" \
-var "image_name=${{ matrix.build.image_name }}" \
-var "inventory_groups=${{ matrix.build.inventory_groups }}" \
Expand All @@ -96,14 +104,14 @@ jobs:
run: |
. venv/bin/activate
IMAGE_ID=$(jq --raw-output '.builds[-1].artifact_id' packer/packer-manifest.json)
while ! openstack image show -f value -c name $IMAGE_ID; do
while ! openstack image show -f value -c name "$IMAGE_ID"; do
sleep 5
done
IMAGE_NAME=$(openstack image show -f value -c name $IMAGE_ID)
IMAGE_NAME=$(openstack image show -f value -c name "$IMAGE_ID")
echo "image-name=${IMAGE_NAME}" >> "$GITHUB_OUTPUT"
echo "image-id=$IMAGE_ID" >> "$GITHUB_OUTPUT"
echo $IMAGE_ID > image-id.txt
echo $IMAGE_NAME > image-name.txt
echo "$IMAGE_ID" > image-id.txt
echo "$IMAGE_NAME" > image-name.txt

- name: Make image usable for further builds
run: |
Expand Down
49 changes: 49 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
name: Lint

on: # yamllint disable-line rule:truthy
workflow_call:

permissions:
contents: read
packages: read
# To report GitHub Actions status checks
statuses: write

jobs:
lint:
name: Lint
runs-on: ubuntu-latest
permissions:
contents: read
packages: read
# To report GitHub Actions status checks
statuses: write

steps:
- uses: actions/checkout@v4
with:
# super-linter needs the full git history to get the
# list of files that changed across commits
fetch-depth: 0
submodules: true

- name: Run ansible-lint
uses: ansible/ansible-lint@v25.4.0
env:
ANSIBLE_COLLECTIONS_PATH: .ansible/collections

- name: Load super-linter configuration
# Use grep inverse matching to exclude eventual comments in the .env file
# because the GitHub Actions command to set environment variables doesn't
# support comments.
# yamllint disable-line rule:line-length
# Ref: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#setting-an-environment-variable
run: grep -v '^#' super-linter.env >> "$GITHUB_ENV"
if: always()

- name: Run super-linter
uses: super-linter/super-linter@v7.3.0
if: always()
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Loading
Loading