-
Notifications
You must be signed in to change notification settings - Fork 0
Update to 2.7 #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
technowhizz
wants to merge
54
commits into
training/leafcloud
Choose a base branch
from
update-to-2.7
base: training/leafcloud
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+6,486
−4,329
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add filesystems docs * Apply suggestions from code review Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com> * Update Ceph instructions for Manila integrations * Update overview * Update docs/filesystems.md Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com> * Update image build instructions for Manila --------- Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com>
* pre-hook to copy requirements.yml.last * remove mention of CI in comments
* First draft of production end-to-end docs * Ubuntu Jammy is also supported * Add TODOs * Accomplish TODOs * Mention networks docs * NFS * Clarify image * Formatting changes * Apply suggestions from code review Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com> * Suggestions from code review * Update docs/production.md Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com> * Add git remote instructions * Update cookiecutter info * Link filesystems docs * Move tofu into define and deploy infra section * Reorganise configuration * Move tofu note --------- Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com>
Without any top-level inventory file, Ansible will fail with: ``` ERROR! Completely failed to parse inventory source /home/ubuntu/ansible-slurm-appliance/environments/$ENV/inventory ```
* WIP: refactor repos definitions * add more repos and cope with CRB/PowerTools oddness * add epel * use pulp_server as a group * add epel default * wip: get pulp sync working * fixed sync * autodetect latest in adhoc script, refactored timestamps to allow gated ohpc repos, fixed pulp site * fixed distributions + ohpc repos * updated timestamps script + bumped rocky 9 timestamps * removed pulp_repo_name fields * updated docs, added gpg checks, simplified filters * Added pulp systemd file + removed unused vars * added READMEs + updated variable names * disabled gpg checks for dnf_repos * typo * fixed disable repos task * bump images * remove dnf_repos extra index/key and make epel/openhpc special-cases simpler * clarify pulp distro selection * fixup sync vars * fixup grafana vars * revert latest timestamp changes for extra key level * review suggestions Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com> * updated README * docs tweaks * regularised group names * updated operations guide for functionality requiring additional installs * review changes from docs Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com> * renamed timestamps.yml to dnf_repos_timestamps.yml --------- Co-authored-by: Steve Brasier <steveb@stackhpc.com> Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com>
* Reorder repositories alphabetically * Bump Pulp snapshots for RL 9.6 * Bump CI image (RL9 only)
Bump CUDA to 13.0.1 and NVIDIA driver to 580.82.07
Make CaaS specific role: `persist_openhpc_secrets` idempotent
* add validation for tofu-templated vars * update error message iaw review
* Add Github Actions for running code linters
* Fix linting issues.
The super-linter.env currently has the following additions that are to be addressed in the future:
VALIDATE_GITHUB_ACTIONS=false
VALIDATE_SHELL_SHFMT=false
VALIDATE_YAML=false
Most of the linting for the above has been addressed with just a single issue remaining that blocks the linter from being enabled.
* Update GH workflow so linting always runs befor any other jobs
* Update GH workflow so linting always runs befor any other jobs
* Fix linting issues on the merge of origin/main
* Fix linting issues on the merge of origin/main
* Use the head ref for workflow concurrency
* Output the path filter result of the workflow
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Tweak github action used to detect changed paths on push/pull request
* Troubleshooting: ansible.builtin.user
* Troubleshooting: debugging temporarily added
* Shift pylint invalid-name linting behond python bang line
* Temporarily disable the ansible galaxy requirements validation
* Reverting changes made to ansible.builtin.user and ansible.builtin.group where the name parameter was added.
Reverting to ansible.builtin.group: <args>
becasue args aren't an expected label:
groupadd: '{'name': 'grafana', 'gid': 979}' is not a valid group name
* Arguments are dicts not labels
* Preserve file permissions on .ssh directory contents
* Wherever we use become_user set become: true, keeps the linter happy and maintains functionality
* Fix linting on merge of origin/main
* Fix linting on merge of origin/main
* Update cluster image - using fatimage built from ci/linting branch
* Add comments to workflow files detailing the CI workflow and enable these workflows
* Fix workflow execution:
1. change trivvy to trivy
2. extra, stackhpc, and trivyscan workflows should trigger on workflow_call and workflow_dispatch
* Fix linting issues from merge of origin/main
* Exclude 'ansible/roles/compute_init/files/compute-init.yml' from ansible lint.
The parser can't load the 'tasks/tuned.yml' ansible so fails with:
load-failure[filenotfounderror]: [Errno 2] No such file or directory: 'ansible-slurm-appliance/tasks/main.yml'
tasks/main.yml:1
This failure can't be skipped beause it's the output of the parser that's fed to the linter where such exceptions are made.
* Temporarily disable Rocky 8 to speed up testing and reduce CI resources
Temporarily disable ansible-lint:
Run ansible/ansible-lint@v25.4.0
Run if [[ -n "" ]]; then
Run action_ref="${GH_ACTION_REF_INPUT:-${GITHUB_ACTION_REF:-main}}"
Using ansible-lint ref: main
Run reqs_file=$(git rev-parse --show-toplevel)/.git/ansible-lint-requirements.txt
--2025-09-09 14:51:58-- https://raw.githubusercontent.com/ansible/ansible-lint/main/.config/requirements-lock.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2025-09-09 14:51:58 ERROR 404: Not Found.
* Fix some bad ansible-lint line-length markup
* Fix ansible-lint markup for line-length
* Bump CI image - FOR RL9 ONLY TO CONSERVE CI RESOURCES
* Revert ansible.builtin.command to ansible.builtin.shell due to missed comment "need login shell for module command" and mask ansible-lint error
* Disable extra-build.yml workflow which has previously passed so we can focus on the stackhpc.yml workflow
* Disable concurrency to see if this is killing stackhpc.yml
* Remove concurrency from extr.yml, stackhpc.yml, and trivyscan.yml as they're all being triggered from main.yml which has its own concurrency check - the trivscan concurrency was also killing stackhpc
* Enable ansible-lint
* Enable triggering of all workflows from the main CI workflow
* Bump CI image - FOR RL9 ONLY TO CONSERVE CI RESOURCES
* Fix bad ansible-lint markup affecting the bang line
* Reduce workflow CI resources whilst fixing test deploy and reimage workflow
* Bump CI image - FOR RL9 ONLY TO CONSERVE CI RESOURCES
* Enable Rocky Linux 8 - disabled to speed up testing
* Enable all CI workflows
* Bump CI image - FOR RL9 ONLY TO CONSERVE CI RESOURCES
* Remove empty line between ansible "when" and "block" added by ansible-lint --fix, it's not required by the linter.
* Enable check for ansible galaxy requirements
* Revert the ansible collections path to ansible/collections so we don't inadvertently break any existing checkouts.
Direct ansible-lint to use .ansible/collections so downloads are excluded from linting by our .ansible-lint.yml
* Bump CI image
It resolves some limitations with login subgroups, such as difficulty to bind the Open OnDemand service to a specific node when naming of the nodes is not predictable. This replicates what is already done for compute subgroups.
* ignore port binding info; fixes tf when admin * ignore port dhcp changes to fix networking-mlxn * ignore port binding/dhcp options for caas * fix TF linter errors
* Fix various comments in Ansible group files * Expose vgpu group in site inventory
* wip: add TF remote state docs * wip s3 remote state * improve gitlab backend configuration * automate s3 creds * make s3 buckets clearer * fix linting * try to allow same headings at different levels in markdown * fix tf lint errors * fix prettier errors
Fix various typos
…I) (#792) * update dnf_repos_timestamps.yml * bump Ark timestamps * update again * make it possible NOT to clean up packer builds * fixup source repo path typo * add missing RL8 PowerTools source repo * correct RL8 source repo files * update timestamps * bump CI image * disable Lustre for RL8 extrabuild tests due to kernel mismatch --------- Co-authored-by: bertiethorpe <bertie443@gmail.com>
* validate nodename groups * add validation for nodegroup name clashes * add validation for nodegroup name clashes * fix linter whinges * extend validation to cover additional_nodegroups * fix TF linting * fixup logic * fix logic * fix linter
* bump OSC's OOD v4.0.1 * pin ondemand 4.0.7 in common env * install ood app packages in fatimage.yml * make packer volume 20 GB to manage ood app packages * fix typo * bump images * update ood cleanup paths triggering trivy errors * bump fatimages * noqa yaml[brackets] for OOD options * fix linter warnings about flow-style * remove wrong comment * Add module FQDN * pickup task name fixes from PR#794 * bump CI image --------- Co-authored-by: Steve Brasier <steveb@stackhpc.com> Co-authored-by: Steve Brasier <33413598+sjpb@users.noreply.github.com>
* support raid root disks in stackhpc-built images * clarify image requirements * bump CI image * fixup grub for RL8 * fix linter issues * fix raid kernel commandline configuration for RL8 [no ci] * bump CI image * fix handler ansible-lint errors * bump CI image
…on check for .caas
Fix .caas secrets not persisting post-reimage and skip tofu vars validation for .caas
Fix image sync workflow for new larger fat images
From access.conf(5):
The second field, the users/group field, should be a list of one or
more login names, group names, or ALL (which always matches). To
differentiate user entries from group entries, group entries should
be written with brackets, e.g. (group).
* support raid root disks in stackhpc-built images * clarify image requirements * bump CI image * remove default build groups * fixup doca/cuda inventory groups * add fatimage inventory group * update docs for image build * minor docs tweaks * fixup fatimage group definition * fix build groups * bump CI image * minor docs tweak * fix linter markdown error * fix linter markdown error * swap example site image build to normal case * fix borked merge * fixes after self-review * bump CI image
* Expose FIPs in inventory hosts file * adding output for "fip_address" * changing 'fip_address' to 'nodegroup_fips'
* delete build VMs in CI nightly cleanup * name build volumes and include in nightly cleanup * simplify cleanup of volumes and include fatimage build VMs --------- Co-authored-by: bertiethorpe <bertie443@gmail.com> Co-authored-by: bertiethorpe <84867280+bertiethorpe@users.noreply.github.com>
* export state directory to ondemand nodes for caas * fixed caas config
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.