-
Notifications
You must be signed in to change notification settings - Fork 29
PRD: Update Jenkins jobs for qa-infra-automation refactor (issues #77 and #78) #568
Description
Problem Statement
The qa-infra-automation repository is being refactored under two RFCs:
- RFC: Decouple Tofu↔Ansible integration via canonical inventory paths and declarative schema qa-infra-automation#77 — Decouple Tofu↔Ansible integration via canonical inventory paths and a declarative schema. Tofu will no longer generate Ansible inventory directly; instead a Python bridge script (
scripts/generate-inventory.py) converts raw Tofu JSON output into Ansible inventory at a canonical path (ansible/inventory/{env}/hosts.yml). Theinventory.yaml.tftpltemplate and the Tofuinventory_output_pathvariable are removed. - RFC: Restructure ansible/ from product-centric to concern-centric layout qa-infra-automation#78 — Restructure
ansible/from product-centric (rke2/airgap/,k3s/default/) to concern-centric layout (playbooks/rke2/,playbooks/utils/,inventory/{env}/,group_vars/all/). A single playbook per distro handles all environments via Ansible tags (--tags airgap,--tags default). The Makefile becomes the single control point:make infra-up,make cluster, andmake infra-down.
The current Jenkins jobs in rancher/tests hardcode the old directory paths and call Tofu and Ansible step-by-step as separate library calls. After the refactor, those paths will no longer exist and the step-by-step integration pattern will be incompatible with the new architecture.
Additionally, the qa-jenkins-library has no abstraction for running Makefile targets, and the rancherlabs/jenkins-job-builder job definitions define parameters that assume the old layout.
Solution
Jenkins jobs should interact with qa-infra-automation exactly as a human operator would: by calling Makefile targets. This makes CI behavior predictable from local testing, ensures the Makefile is the canonical control point, and reduces the surface area of path-coupling between repos.
The solution has three parts:
- Add a
makefilemodule toqa-jenkins-librarythat can invoke Makefile targets inside the existing Docker container (rancher-infra-tools), passing variables asKEY=VALUEarguments. - Rewrite the affected Jenkinsfiles in
rancher/teststo use the new library module, writegroup_varsbefore invoking make, and archivegroup_varsandterraform.tfvarsas reproducibility artifacts. - Update
rancherlabs/jenkins-job-builderjob definitions to reflect any new or renamed parameters required by the refactored Makefile interface.
User Stories
- As a Jenkins job operator, I want the setup job to call
make infra-upso that infrastructure provisioning behavior matches what I get when running locally. - As a Jenkins job operator, I want the cluster deployment job to call
make cluster ENV=airgap DISTRO=rke2so that playbook selection and inventory path resolution are handled by the Makefile, not by Jenkins. - As a Jenkins job operator, I want the destroy job to call
make infra-downso that teardown mirrors local teardown behavior. - As a QA engineer, I want the Jenkins job to archive
ansible/group_vars/all/*.ymlandterraform.tfvarsas build artifacts so that I can reproduce the exact environment locally without reverse-engineering the job configuration. - As a QA engineer, I want to download the archived
group_varsandterraform.tfvarsfrom a Jenkins build and re-runmake clusterlocally against the same infrastructure. - As a QA engineer, I want the Jenkins job to write Ansible variables (versions, credentials, hostnames) into
ansible/group_vars/all/before invoking make, so that the Makefile receives a complete, pre-populated environment. - As a pipeline maintainer, I want a
makefilemodule inqa-jenkins-librarywith a consistent interface for invoking Makefile targets inside Docker so that I do not have to write rawshcommands in each Jenkinsfile. - As a pipeline maintainer, I want the
makefilemodule to support passingKEY=VALUEvariable overrides tomakeso that versions, environment names, and distro selections can be injected by Jenkins. - As a pipeline maintainer, I want the
makefilemodule to run inside the existingrancher-infra-toolsDocker image so that no new container or tool installation is required. - As a pipeline maintainer, I want the Tofu workspace name to be generated by Jenkins, passed to
makeas a variable (WORKSPACE=...), and archived as a build artifact so that the destroy job can retrieve it and pass it back tomake infra-down. - As a pipeline maintainer, I want the S3 backend initialization to be handled inside
make infra-up(via environment variables for credentials, bucket, and region) so that Jenkins does not need a separatetofu.initBackend()call. - As a pipeline maintainer, I want the RKE2 airgap setup Jenkinsfile to be rewritten to use
make infra-upandmake cluster ENV=airgap DISTRO=rke2so that the job works with the refactored repo layout. - As a pipeline maintainer, I want the RKE2 airgap destroy Jenkinsfile to be rewritten to call
make infra-down WORKSPACE=<name>so that teardown does not require knowledge of old Tofu module paths. - As a pipeline maintainer, I want the airgap Go tests Jenkinsfile to be rewritten to provision infrastructure via
make infra-up, deploy the cluster viamake cluster, and then run Go tests, so that the full pipeline works with the new layout. - As a pipeline maintainer, I want the elemental e2e Jenkinsfile (
Jenkinsfile.elemental.e2e) to be refactored to work with the newansible/layout and Makefile interface. - As a pipeline maintainer, I want the elemental Harvester e2e Jenkinsfile (
Jenkinsfile.elemental.harvester.e2e) to be refactored to work with the newansible/layout and Makefile interface. - As a pipeline maintainer, I want all Jenkinsfiles that invoke Ansible or Tofu directly (including
validation/Jenkinsfile.e2eandvalidation/pipeline/tfp/Jenkinsfile.harvester.e2e) to be reviewed and updated if they reference qa-infra-automation paths, so that no jobs silently break after the refactor. - As a Jenkins job builder maintainer, I want the JJB job definitions to expose
DISTRO,ENV, and optionallyCLUSTERas job parameters so that the Makefile's interface is surfaced through Jenkins job configuration. - As a Jenkins job builder maintainer, I want any JJB job definitions that reference old qa-infra-automation paths or old Ansible variable file locations to be updated to remove those assumptions.
- As a downstream cluster operator, I want a clear path for destroying only downstream cluster infrastructure (distinct from management cluster teardown) so that multi-cluster QA environments can be cleaned up independently.
- As a developer running qa-infra-automation locally, I want the Jenkins job to behave identically to local Makefile usage so that I can debug Jenkins failures locally without special knowledge of CI-specific steps.
- As a security-conscious operator, I want credentials (AWS keys, SSH keys, registry passwords) to be injected into the Docker container as environment variables, not written to files that are then archived, so that secrets are not persisted in build artifacts.
Implementation Decisions
New makefile module in qa-jenkins-library
A new Groovy module (makefile.groovy or equivalent) is added to qa-jenkins-library. It exposes at minimum:
makefile.run(target, vars, dir, image, workspace)— runsmake <target> KEY=VALUE...inside the specified Docker image, mounting the Jenkins workspace, in the given directory.- The module handles Docker
--env-fileinjection for credential environment variables andKEY=VALUEfor Makefile variables separately, so secrets are not passed on the command line. - Mirrors the existing pattern used by
tofu.*andansible.*modules in the library.
Jenkinsfiles write group_vars before invoking make
Jenkins remains responsible for injecting secrets and version pinning into ansible/group_vars/all/. The job writes one or more YAML files under that directory (e.g., a file containing rke2_version, rancher_version, private_registry_url, ssh_private_key_file, etc.) before calling make cluster. This mirrors the operator workflow of populating group_vars before running make locally.
After make cluster completes, Jenkins archives:
ansible/group_vars/all/*.yml— all group_vars files (secrets redacted or excluded if needed)terraform.tfvars— the Tofu variable file used for provisioning
These artifacts allow any engineer to reproduce the environment.
Tofu workspace management
Workspace names are generated by Jenkins (same jenkins_airgap_ansible_workspace_<prefix> convention as today) and passed to make as WORKSPACE=<name>. The Makefile is expected to accept WORKSPACE as an override for the workspace name it uses when calling tofu workspace select/new. The workspace name is archived as a build artifact so the destroy job can retrieve it. This may require a small addition to the qa-infra-automation Makefile (adding WORKSPACE as an overridable variable), which should be tracked in that repo.
S3 backend initialization inside the Makefile
The Makefile's infra-up target should handle tofu init with the S3 backend using environment variables (S3_BUCKET_NAME, S3_KEY_PREFIX, S3_BUCKET_REGION) passed from Jenkins. Jenkins sets these as environment variables in the Docker container. This removes the need for a separate tofu.initBackend() call in Jenkinsfiles.
Distro and environment parameters in JJB
JJB job definitions for affected jobs gain DISTRO (string, default rke2) and ENV (string, default airgap) parameters. These are passed to make cluster ENV=$ENV DISTRO=$DISTRO. Downstream cluster jobs additionally expose a CLUSTER parameter (string, the cluster name for ansible/inventory/downstream/<name>/).
Elemental jobs
Both elemental Jenkinsfiles (Jenkinsfile.elemental.e2e and Jenkinsfile.elemental.harvester.e2e) write config files into old paths (ansible/rancher/downstream/elemental/libvirt, ansible/rancher/downstream/elemental/harvester). These will be refactored to write into the new ansible/group_vars/all/ structure and invoke make targets for elemental deployment. The exact Makefile targets for elemental are to be defined in qa-infra-automation (likely make cluster DISTRO=elemental with appropriate tags or a dedicated target). This work is blocked on the qa-infra-automation refactor completing for elemental.
Jobs not using qa-infra-automation directly
validation/Jenkinsfile.e2e uses corral for infrastructure, not qa-infra-automation. It does not reference qa-infra-automation paths and requires no changes unless it is later migrated.
validation/pipeline/tfp/Jenkinsfile.harvester.e2e uses an older infra-repo pattern with direct Tofu/Ansible calls. It writes to ansible/vars.yaml (a root-level vars file) and ansible/k3s/default/roles/. This file references qa-infra-automation paths and must be reviewed and updated to use the new group_vars/all/ location and new playbook paths. It does not use the Makefile today and should be migrated to use it as part of this work.
Dockerfile
The Dockerfile.infra and Dockerfile.airgap-go-tests already include make as they are based on standard Linux images with build tools. No Dockerfile changes are expected, but this should be verified.
Testing Decisions
A good test for these Jenkinsfiles verifies externally observable behavior: did the right Makefile target get called with the right arguments? Did the expected artifacts get archived? Did the job succeed end-to-end against real infrastructure? Implementation details (which Docker flags, how the env file is constructed) are not worth testing at the unit level.
What to test:
- Smoke test each updated Jenkinsfile against a real Jenkins instance with a test/sandbox qa-infra-automation branch that has the new layout. Verify the job reaches the same stages and produces equivalent infrastructure as before.
- Artifact verification — after a successful run of the setup job, verify that
group_varsfiles andterraform.tfvarsare present as archived artifacts and contain the expected variable names (not necessarily values, to avoid secrets in test assertions). - Destroy job round-trip — verify that running the setup job followed by the destroy job leaves no orphaned Tofu workspace or AWS resources.
makefilemodule unit test — if the qa-jenkins-library uses a test framework (e.g., Jenkins Pipeline Unit), add a test that verifiesmakefile.run('cluster', ['ENV': 'airgap', 'DISTRO': 'rke2'], ...)constructs the correctmake cluster ENV=airgap DISTRO=rke2invocation inside Docker.
Prior art in the codebase:
validation/pipeline/Jenkinsfile.setup.airgap.rke2is the primary reference implementation. Its stage structure (Checkout → Configure Variables → infra-up → Configure Ansible → Deploy Cluster → Archive) should be preserved with updated internals.- The
tofu.*andansible.*modules in qa-jenkins-library are the pattern to follow when implementing themakefilemodule.
Modules to test:
makefilemodule in qa-jenkins-library (unit test for argument construction)Jenkinsfile.setup.airgap.rke2(end-to-end smoke test)Jenkinsfile.destroy.airgap.rke2(end-to-end smoke test, round-trip with setup)Jenkinsfile.airgap.go-tests(end-to-end with Go test execution)
Out of Scope
- Refactoring
validation/Jenkinsfile.e2e(uses corral, not qa-infra-automation) - Adding downstream cluster teardown as a dedicated Makefile target (
make infra-down-cluster CLUSTER=<name>) — this is desirable but tracked separately in qa-infra-automation - Migrating jobs that use neither Ansible nor Tofu from qa-infra-automation
- Changes to Tofu module structure (
tofu/aws/modules/airgap) — these are not changed by RFC: Decouple Tofu↔Ansible integration via canonical inventory paths and declarative schema qa-infra-automation#77 or RFC: Restructure ansible/ from product-centric to concern-centric layout qa-infra-automation#78 and the Tofu paths in Jenkinsfiles remain valid - Changes to the Rancher test Go code itself
Further Notes
- This work is blocked on RFC: Decouple Tofu↔Ansible integration via canonical inventory paths and declarative schema qa-infra-automation#77 and RFC: Restructure ansible/ from product-centric to concern-centric layout qa-infra-automation#78 being merged. Jenkinsfile changes should be developed against a feature branch of qa-infra-automation and merged after the refactor lands.
- The
QA_INFRA_REPO_BRANCHparameter present in all affected Jenkinsfiles allows pointing jobs at a qa-infra-automation feature branch for integration testing before the refactor is merged. - The Makefile must accept
WORKSPACEas an overridable variable for Tofu workspace selection. If it does not, a small addition to qa-infra-automation is required and should be tracked there. - The S3 backend init should be absorbed into
make infra-upin qa-infra-automation. If it is not, Jenkins falls back to calling the existingtofu.initBackend()step before invoking make, which is acceptable as a transitional measure. - Secrets (AWS keys, registry credentials, SSH keys) must never appear in archived artifacts. Jenkins should write only non-secret variables (versions, hostnames, distro names) to
group_varsfiles that are archived. Credential variables should be injected via Docker environment only.
Related:
- RFC: Decouple Tofu↔Ansible integration via canonical inventory paths and declarative schema qa-infra-automation#77 — Tofu↔Ansible decoupling (inventory path contract this work depends on)
- RFC: Restructure ansible/ from product-centric to concern-centric layout qa-infra-automation#78 — Ansible directory restructure (playbook and group_vars paths this work depends on)
- rancherlabs/jenkins-job-builder — JJB job definitions to update alongside this work
- rancher/qa-jenkins-library — shared Groovy library receiving the new
makefilemodule