Skip to content

PRD: Update Jenkins jobs for qa-infra-automation refactor (issues #77 and #78) #568

@floatingman

Description

@floatingman

Problem Statement

The qa-infra-automation repository is being refactored under two RFCs:

The current Jenkins jobs in rancher/tests hardcode the old directory paths and call Tofu and Ansible step-by-step as separate library calls. After the refactor, those paths will no longer exist and the step-by-step integration pattern will be incompatible with the new architecture.

Additionally, the qa-jenkins-library has no abstraction for running Makefile targets, and the rancherlabs/jenkins-job-builder job definitions define parameters that assume the old layout.

Solution

Jenkins jobs should interact with qa-infra-automation exactly as a human operator would: by calling Makefile targets. This makes CI behavior predictable from local testing, ensures the Makefile is the canonical control point, and reduces the surface area of path-coupling between repos.

The solution has three parts:

  1. Add a makefile module to qa-jenkins-library that can invoke Makefile targets inside the existing Docker container (rancher-infra-tools), passing variables as KEY=VALUE arguments.
  2. Rewrite the affected Jenkinsfiles in rancher/tests to use the new library module, write group_vars before invoking make, and archive group_vars and terraform.tfvars as reproducibility artifacts.
  3. Update rancherlabs/jenkins-job-builder job definitions to reflect any new or renamed parameters required by the refactored Makefile interface.

User Stories

  1. As a Jenkins job operator, I want the setup job to call make infra-up so that infrastructure provisioning behavior matches what I get when running locally.
  2. As a Jenkins job operator, I want the cluster deployment job to call make cluster ENV=airgap DISTRO=rke2 so that playbook selection and inventory path resolution are handled by the Makefile, not by Jenkins.
  3. As a Jenkins job operator, I want the destroy job to call make infra-down so that teardown mirrors local teardown behavior.
  4. As a QA engineer, I want the Jenkins job to archive ansible/group_vars/all/*.yml and terraform.tfvars as build artifacts so that I can reproduce the exact environment locally without reverse-engineering the job configuration.
  5. As a QA engineer, I want to download the archived group_vars and terraform.tfvars from a Jenkins build and re-run make cluster locally against the same infrastructure.
  6. As a QA engineer, I want the Jenkins job to write Ansible variables (versions, credentials, hostnames) into ansible/group_vars/all/ before invoking make, so that the Makefile receives a complete, pre-populated environment.
  7. As a pipeline maintainer, I want a makefile module in qa-jenkins-library with a consistent interface for invoking Makefile targets inside Docker so that I do not have to write raw sh commands in each Jenkinsfile.
  8. As a pipeline maintainer, I want the makefile module to support passing KEY=VALUE variable overrides to make so that versions, environment names, and distro selections can be injected by Jenkins.
  9. As a pipeline maintainer, I want the makefile module to run inside the existing rancher-infra-tools Docker image so that no new container or tool installation is required.
  10. As a pipeline maintainer, I want the Tofu workspace name to be generated by Jenkins, passed to make as a variable (WORKSPACE=...), and archived as a build artifact so that the destroy job can retrieve it and pass it back to make infra-down.
  11. As a pipeline maintainer, I want the S3 backend initialization to be handled inside make infra-up (via environment variables for credentials, bucket, and region) so that Jenkins does not need a separate tofu.initBackend() call.
  12. As a pipeline maintainer, I want the RKE2 airgap setup Jenkinsfile to be rewritten to use make infra-up and make cluster ENV=airgap DISTRO=rke2 so that the job works with the refactored repo layout.
  13. As a pipeline maintainer, I want the RKE2 airgap destroy Jenkinsfile to be rewritten to call make infra-down WORKSPACE=<name> so that teardown does not require knowledge of old Tofu module paths.
  14. As a pipeline maintainer, I want the airgap Go tests Jenkinsfile to be rewritten to provision infrastructure via make infra-up, deploy the cluster via make cluster, and then run Go tests, so that the full pipeline works with the new layout.
  15. As a pipeline maintainer, I want the elemental e2e Jenkinsfile (Jenkinsfile.elemental.e2e) to be refactored to work with the new ansible/ layout and Makefile interface.
  16. As a pipeline maintainer, I want the elemental Harvester e2e Jenkinsfile (Jenkinsfile.elemental.harvester.e2e) to be refactored to work with the new ansible/ layout and Makefile interface.
  17. As a pipeline maintainer, I want all Jenkinsfiles that invoke Ansible or Tofu directly (including validation/Jenkinsfile.e2e and validation/pipeline/tfp/Jenkinsfile.harvester.e2e) to be reviewed and updated if they reference qa-infra-automation paths, so that no jobs silently break after the refactor.
  18. As a Jenkins job builder maintainer, I want the JJB job definitions to expose DISTRO, ENV, and optionally CLUSTER as job parameters so that the Makefile's interface is surfaced through Jenkins job configuration.
  19. As a Jenkins job builder maintainer, I want any JJB job definitions that reference old qa-infra-automation paths or old Ansible variable file locations to be updated to remove those assumptions.
  20. As a downstream cluster operator, I want a clear path for destroying only downstream cluster infrastructure (distinct from management cluster teardown) so that multi-cluster QA environments can be cleaned up independently.
  21. As a developer running qa-infra-automation locally, I want the Jenkins job to behave identically to local Makefile usage so that I can debug Jenkins failures locally without special knowledge of CI-specific steps.
  22. As a security-conscious operator, I want credentials (AWS keys, SSH keys, registry passwords) to be injected into the Docker container as environment variables, not written to files that are then archived, so that secrets are not persisted in build artifacts.

Implementation Decisions

New makefile module in qa-jenkins-library

A new Groovy module (makefile.groovy or equivalent) is added to qa-jenkins-library. It exposes at minimum:

  • makefile.run(target, vars, dir, image, workspace) — runs make <target> KEY=VALUE... inside the specified Docker image, mounting the Jenkins workspace, in the given directory.
  • The module handles Docker --env-file injection for credential environment variables and KEY=VALUE for Makefile variables separately, so secrets are not passed on the command line.
  • Mirrors the existing pattern used by tofu.* and ansible.* modules in the library.

Jenkinsfiles write group_vars before invoking make

Jenkins remains responsible for injecting secrets and version pinning into ansible/group_vars/all/. The job writes one or more YAML files under that directory (e.g., a file containing rke2_version, rancher_version, private_registry_url, ssh_private_key_file, etc.) before calling make cluster. This mirrors the operator workflow of populating group_vars before running make locally.

After make cluster completes, Jenkins archives:

  • ansible/group_vars/all/*.yml — all group_vars files (secrets redacted or excluded if needed)
  • terraform.tfvars — the Tofu variable file used for provisioning

These artifacts allow any engineer to reproduce the environment.

Tofu workspace management

Workspace names are generated by Jenkins (same jenkins_airgap_ansible_workspace_<prefix> convention as today) and passed to make as WORKSPACE=<name>. The Makefile is expected to accept WORKSPACE as an override for the workspace name it uses when calling tofu workspace select/new. The workspace name is archived as a build artifact so the destroy job can retrieve it. This may require a small addition to the qa-infra-automation Makefile (adding WORKSPACE as an overridable variable), which should be tracked in that repo.

S3 backend initialization inside the Makefile

The Makefile's infra-up target should handle tofu init with the S3 backend using environment variables (S3_BUCKET_NAME, S3_KEY_PREFIX, S3_BUCKET_REGION) passed from Jenkins. Jenkins sets these as environment variables in the Docker container. This removes the need for a separate tofu.initBackend() call in Jenkinsfiles.

Distro and environment parameters in JJB

JJB job definitions for affected jobs gain DISTRO (string, default rke2) and ENV (string, default airgap) parameters. These are passed to make cluster ENV=$ENV DISTRO=$DISTRO. Downstream cluster jobs additionally expose a CLUSTER parameter (string, the cluster name for ansible/inventory/downstream/<name>/).

Elemental jobs

Both elemental Jenkinsfiles (Jenkinsfile.elemental.e2e and Jenkinsfile.elemental.harvester.e2e) write config files into old paths (ansible/rancher/downstream/elemental/libvirt, ansible/rancher/downstream/elemental/harvester). These will be refactored to write into the new ansible/group_vars/all/ structure and invoke make targets for elemental deployment. The exact Makefile targets for elemental are to be defined in qa-infra-automation (likely make cluster DISTRO=elemental with appropriate tags or a dedicated target). This work is blocked on the qa-infra-automation refactor completing for elemental.

Jobs not using qa-infra-automation directly

validation/Jenkinsfile.e2e uses corral for infrastructure, not qa-infra-automation. It does not reference qa-infra-automation paths and requires no changes unless it is later migrated.

validation/pipeline/tfp/Jenkinsfile.harvester.e2e uses an older infra-repo pattern with direct Tofu/Ansible calls. It writes to ansible/vars.yaml (a root-level vars file) and ansible/k3s/default/roles/. This file references qa-infra-automation paths and must be reviewed and updated to use the new group_vars/all/ location and new playbook paths. It does not use the Makefile today and should be migrated to use it as part of this work.

Dockerfile

The Dockerfile.infra and Dockerfile.airgap-go-tests already include make as they are based on standard Linux images with build tools. No Dockerfile changes are expected, but this should be verified.

Testing Decisions

A good test for these Jenkinsfiles verifies externally observable behavior: did the right Makefile target get called with the right arguments? Did the expected artifacts get archived? Did the job succeed end-to-end against real infrastructure? Implementation details (which Docker flags, how the env file is constructed) are not worth testing at the unit level.

What to test:

  • Smoke test each updated Jenkinsfile against a real Jenkins instance with a test/sandbox qa-infra-automation branch that has the new layout. Verify the job reaches the same stages and produces equivalent infrastructure as before.
  • Artifact verification — after a successful run of the setup job, verify that group_vars files and terraform.tfvars are present as archived artifacts and contain the expected variable names (not necessarily values, to avoid secrets in test assertions).
  • Destroy job round-trip — verify that running the setup job followed by the destroy job leaves no orphaned Tofu workspace or AWS resources.
  • makefile module unit test — if the qa-jenkins-library uses a test framework (e.g., Jenkins Pipeline Unit), add a test that verifies makefile.run('cluster', ['ENV': 'airgap', 'DISTRO': 'rke2'], ...) constructs the correct make cluster ENV=airgap DISTRO=rke2 invocation inside Docker.

Prior art in the codebase:

  • validation/pipeline/Jenkinsfile.setup.airgap.rke2 is the primary reference implementation. Its stage structure (Checkout → Configure Variables → infra-up → Configure Ansible → Deploy Cluster → Archive) should be preserved with updated internals.
  • The tofu.* and ansible.* modules in qa-jenkins-library are the pattern to follow when implementing the makefile module.

Modules to test:

  • makefile module in qa-jenkins-library (unit test for argument construction)
  • Jenkinsfile.setup.airgap.rke2 (end-to-end smoke test)
  • Jenkinsfile.destroy.airgap.rke2 (end-to-end smoke test, round-trip with setup)
  • Jenkinsfile.airgap.go-tests (end-to-end with Go test execution)

Out of Scope

Further Notes

  • This work is blocked on RFC: Decouple Tofu↔Ansible integration via canonical inventory paths and declarative schema qa-infra-automation#77 and RFC: Restructure ansible/ from product-centric to concern-centric layout qa-infra-automation#78 being merged. Jenkinsfile changes should be developed against a feature branch of qa-infra-automation and merged after the refactor lands.
  • The QA_INFRA_REPO_BRANCH parameter present in all affected Jenkinsfiles allows pointing jobs at a qa-infra-automation feature branch for integration testing before the refactor is merged.
  • The Makefile must accept WORKSPACE as an overridable variable for Tofu workspace selection. If it does not, a small addition to qa-infra-automation is required and should be tracked there.
  • The S3 backend init should be absorbed into make infra-up in qa-infra-automation. If it is not, Jenkins falls back to calling the existing tofu.initBackend() step before invoking make, which is acceptable as a transitional measure.
  • Secrets (AWS keys, registry credentials, SSH keys) must never appear in archived artifacts. Jenkins should write only non-secret variables (versions, hostnames, distro names) to group_vars files that are archived. Credential variables should be injected via Docker environment only.

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestteam/pit-crewslack notifier for pit crew

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions