Skip to content

feat: split airgap RKE2 infra pipeline into setup/destroy jobs#595

Open
floatingman wants to merge 18 commits intorancher:mainfrom
floatingman:feature/issue-590-airgap-rke2-infra-pipeline
Open

feat: split airgap RKE2 infra pipeline into setup/destroy jobs#595
floatingman wants to merge 18 commits intorancher:mainfrom
floatingman:feature/issue-590-airgap-rke2-infra-pipeline

Conversation

@floatingman
Copy link
Copy Markdown
Contributor

@floatingman floatingman commented Apr 1, 2026

Summary

  • Split unified Jenkinsfile.airgap-rke2-infra into separate setup (Jenkinsfile.setup.airgap-rke2-infra) and destroy (Jenkinsfile.destroy.airgap-rke2-infra) pipelines
  • Setup pipeline uses make.runTarget() for RKE2 cluster, private registry, and Rancher deployment via the qa-infra-automation Makefile
  • Destroy pipeline uses Jenkins library directly (not Makefile) due to the Makefile's interactive infra-down prompt and lack of workspace management
  • Added make to Dockerfile.infra Alpine packages
  • Archives Rancher admin token (rancher-admin-token.json) as a build artifact
  • Post-failure teardown via airgap.teardownInfrastructure()

Test Plan

  • Run airgap-rke2-infra-setup job in Jenkins — verify provisions infrastructure, deploys RKE2 cluster, and optionally deploys Rancher
  • Verify Rancher admin token is archived as a build artifact
  • Run airgap-rke2-infra-destroy job — verify infrastructure is torn down and workspace deleted
  • Verify post-failure cleanup works (DESTROY_ON_FAILURE=true)

Depends on:

…-infra)

Create a single Declarative Pipeline that replaces both
Jenkinsfile.setup.airgap.rke2 and Jenkinsfile.destroy.airgap.rke2,
controlled by an ACTION parameter (setup/destroy).

Consumes shared functions from qa-jenkins-library (rancher#589):
- airgap.standardCheckout, airgap.configureAnsible, airgap.deployRKE2,
  airgap.deployRancher, airgap.teardownInfrastructure
- s3.uploadArtifact, s3.downloadArtifact, s3.deleteArtifact
- tofu.initBackend, tofu.createWorkspace, tofu.apply, tofu.getOutputs

Original Jenkinsfiles remain untouched for parallel coexistence.

Refs rancher#590
Pass TESTS_REPO_URL and QA_INFRA_REPO_URL to airgap.standardCheckout()
so the pipeline can be pointed at a fork for testing.

Refs rancher#590
The per-stage useWithProperties wrapping caused AWS_SSH_PEM_KEY_NAME
to be unavailable when only AWS_SSH_PEM_KEY was loaded. Now all
credentials are loaded in a single wrapper around the stages block,
matching the original pipeline's approach.

Refs rancher#590
AWS credentials were unavailable inside Docker containers because the
property.useWithProperties wrapper was removed. Restructured to use
a parent stage with nested stages inside a single credentials wrapper,
ensuring all credentials are available throughout the pipeline.

Refs rancher#590
env.WORKSPACE_NAME was null inside nested closures (script >
useWithProperties > stage). Use a local variable which persists
correctly across all nested stages within the same script block.

Refs rancher#590
The s3.groovy shared functions default to config.groovy values
(rancher-qa-artifacts, us-east-1) but the pipeline uses
S3_BUCKET_NAME and S3_BUCKET_REGION from Jenkins job parameters.
Pass these as explicit overrides to uploadArtifact, downloadArtifact,
and deleteArtifact.

Also move env.WORKSPACE_NAME assignment after tofu.createWorkspace
succeeds so the post-failure cleanup block has a valid value.
The feature/decouple-tofu branch removed the local_file resource that
previously wrote inventory.yml during tofu apply. The pipeline now
generates the inventory by fetching the airgap_inventory_json output,
parsing it, and rendering the YAML inventory file before Ansible stages.
Replace the hand-rolled Groovy inventory generation with
qa-infra-automation's scripts/generate_inventory.py, which is the
canonical inventory renderer. Runs inside the Docker container that
already has Python and PyYAML.
Instead of uploading terraform.tfvars to S3 during setup and downloading
during destroy, reconstruct the tfvars file from TERRAFORM_CONFIG
environment variable (set via Jenkins folder properties) in both the
destroy path and post-failure cleanup.

Removes:
- Setup: Upload Terraform Variables to S3
- Destroy: Download Terraform Variables from S3
- Destroy: Delete S3 Artifacts

Adds:
- Destroy: Configure Tofu Variables (reconstructs tfvars)
- Post-failure cleanup now writes tfvars before teardown
The bastion node needs the SSH key to scp bundles to airgap nodes.
The original setup pipeline had this stage but it was missing from
the unified pipeline. Runs setup-ssh-keys.yml playbook after
Ansible configuration and before RKE2 deployment.
Add echo statements showing which conditions pass/fail so we can
diagnose why teardown isn't triggering when the build fails.
env.WORKSPACE_NAME doesn't persist from inside the
property.useWithProperties closure to the post block (shows 'null').
Instead, read workspace_name.txt which is written by
infrastructure.archiveWorkspaceName during the setup stages.
Split unified Jenkinsfile.airgap-rke2-infra into separate setup and
destroy pipelines. Setup uses make.runTarget() for cluster/registry/
rancher Makefile targets; destroy uses Jenkins library directly.
Also adds make to Dockerfile.infra and archives Rancher admin token.
In Jenkinsfiles, shared library vars are global variables — use
make.runTarget() directly, not new make(). The new X() pattern
is only needed inside other vars/*.groovy files.
…troy Jenkinsfile

The `text` parameter `default: |` is YAML multiline syntax, not valid
Groovy. Replaced with `defaultValue: '''...'''` to fix the compilation
error: "unexpected token: | @ line 84".
lscalabrini01
lscalabrini01 previously approved these changes Apr 1, 2026
…t jobs)

The setup and destroy pipelines are now separate:
- Jenkinsfile.setup.airgap-rke2-infra
- Jenkinsfile.destroy.airgap-rke2-infra
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Splits the airgap RKE2 infrastructure Jenkins pipeline into dedicated setup and destroy jobs to better isolate provisioning vs teardown flows, while updating the infra tools image to support Makefile-driven automation.

Changes:

  • Add Jenkinsfile.setup.airgap-rke2-infra to provision infra, deploy RKE2, optionally configure registry, and optionally deploy Rancher (via Makefile targets).
  • Add Jenkinsfile.destroy.airgap-rke2-infra to teardown infra for a specified workspace using shared Jenkins library functions.
  • Update Dockerfile.infra to install make for Makefile-driven automation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
validation/pipeline/Jenkinsfile.setup.airgap-rke2-infra New setup pipeline: checkout, build tools image, provision infra via tofu, configure Ansible, run Make targets, archive token, and teardown-on-failure logic
validation/pipeline/Jenkinsfile.destroy.airgap-rke2-infra New destroy pipeline: checkout, build tools image, init tofu backend, write tfvars, teardown infra, and cleanup artifacts
validation/pipeline/Jenkinsfile.airgap-rke2-infra Unified pipeline file present in the PR despite the PR’s “split” goal
validation/pipeline/Dockerfile.infra Adds make to the infra tools image packages

Comment on lines +89 to +98
string(
name: 'PRIVATE_REGISTRY_USERNAME',
defaultValue: '',
description: 'Private registry username'
)
string(
name: 'PRIVATE_REGISTRY_PASSWORD',
defaultValue: '',
description: 'Private registry password'
)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRIVATE_REGISTRY_PASSWORD is defined as a plain string parameter, which makes it visible in Jenkins job configuration/build parameters and potentially logs. This should be a masked password parameter (or sourced from Jenkins credentials/folder properties) to avoid credential exposure.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +203
# Whether to enable private registry configuration
enable_private_registry: true

# Deploy Rancher
deploy_rancher: true
install_helm: true

rancher_hostname: "${HOSTNAME_PREFIX}.qa.rancher.space"
rancher_bootstrap_password: "rancherrocks"
rancher_admin_password: "rancherrocks"
rancher_image_tag: ${RANCHER_VERSION}
rancher_use_bundled_system_charts: true'''
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default ANSIBLE_VARIABLES hard-codes enable_private_registry: true and deploy_rancher: true, but the pipeline behavior makes both optional (PRIVATE_REGISTRY_URL gating + DEPLOY_RANCHER parameter). This mismatch can lead to Ansible/Make behavior diverging from the selected pipeline options; consider templating these values from parameters (or removing them if they’re not used).

Copilot uses AI. Check for mistakes.
Comment on lines +361 to +367
stage('Deploy RKE2 Cluster') {
make.runTarget(
target: 'cluster',
dir: 'qa-infra-automation',
makeArgs: 'ENV=airgap',
passAwsCreds: false
)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several steps use hard-coded paths like dir: 'qa-infra-automation' / qa-infra-automation/tmp/... even though checkout returns env.INFRA_DIR. If airgap.standardCheckout ever changes its target directory (or the workspace layout differs), these stages will break; prefer using env.INFRA_DIR consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +399 to +407
stage('Archive Rancher Admin Token') {
if (params.DEPLOY_RANCHER) {
def tokenPath = 'qa-infra-automation/tmp/rancher-admin-token.json'
if (fileExists(tokenPath)) {
archiveArtifacts artifacts: tokenPath, fingerprint: true
echo "Archived Rancher admin token from ${tokenPath}"
} else {
echo "Warning: Rancher admin token file not found at ${tokenPath}"
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Archiving rancher-admin-token.json as a build artifact (with fingerprinting) exposes a highly privileged credential to anyone who can access artifacts (and potentially to downstream systems via fingerprinting). Consider storing it in a credentials store/secret manager instead, or at minimum avoid fingerprinting and ensure artifact access is tightly restricted and short-lived.

Copilot uses AI. Check for mistakes.
Comment on lines +469 to +494
try {
property.useWithProperties([
'AWS_ACCESS_KEY_ID',
'AWS_SECRET_ACCESS_KEY'
]) {
def tofuDir = "${env.INFRA_DIR}/tofu/aws/modules/airgap"

def terraformConfig = infrastructure.parseAndSubstituteVars(
content: env.TERRAFORM_CONFIG,
envVars: [
'AWS_ACCESS_KEY_ID': env.AWS_ACCESS_KEY_ID,
'AWS_SECRET_ACCESS_KEY': env.AWS_SECRET_ACCESS_KEY,
'HOSTNAME_PREFIX': env.HOSTNAME_PREFIX,
'AWS_SSH_PEM_KEY_NAME': env.AWS_SSH_PEM_KEY_NAME
]
)
infrastructure.writeConfig(
path: "${tofuDir}/terraform.tfvars",
content: terraformConfig
)

airgap.teardownInfrastructure(
dir: tofuDir,
name: wsName,
varFile: 'terraform.tfvars'
)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In post { failure { ... } }, the teardown path rebuilds terraform.tfvars using env.AWS_SSH_PEM_KEY_NAME, but that variable is not loaded in the useWithProperties block used for cleanup (only AWS access/secret are). If TERRAFORM_CONFIG references ${AWS_SSH_PEM_KEY_NAME}, substitution may produce an incorrect tfvars and prevent cleanup; include the needed property in the cleanup useWithProperties list or avoid substituting it here.

Copilot uses AI. Check for mistakes.
* rancher — Deploy Rancher via Helm
*/

def libraryBranch = params.QA_JENKINS_LIBRARY_BRANCH ?: 'main'
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Jenkinsfile loads the shared library branch using params.QA_JENKINS_LIBRARY_BRANCH at top-level. Elsewhere in this repo the convention is to use env.QA_JENKINS_LIBRARY_BRANCH for early Jenkinsfile evaluation (e.g. validation/pipeline/Jenkinsfile.setup.airgap.rke2:3); using params here can be brittle depending on when parameters are bound. Consider switching to env.QA_JENKINS_LIBRARY_BRANCH for consistency and reliability.

Suggested change
def libraryBranch = params.QA_JENKINS_LIBRARY_BRANCH ?: 'main'
def libraryBranch = env.QA_JENKINS_LIBRARY_BRANCH ?: 'main'

Copilot uses AI. Check for mistakes.
* infrastructure.writeConfig, infrastructure.cleanupArtifacts
*/

def libraryBranch = params.QA_JENKINS_LIBRARY_BRANCH ?: 'main'
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Jenkinsfile loads the shared library branch using params.QA_JENKINS_LIBRARY_BRANCH at top-level. The established pattern in this repo is env.QA_JENKINS_LIBRARY_BRANCH (e.g. validation/pipeline/Jenkinsfile.destroy.airgap.rke2:3), which is safer during Jenkinsfile evaluation. Consider switching to env for consistency/reliability.

Suggested change
def libraryBranch = params.QA_JENKINS_LIBRARY_BRANCH ?: 'main'
def libraryBranch = env.QA_JENKINS_LIBRARY_BRANCH ?: 'main'

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +159
'HOSTNAME_PREFIX': env.HOSTNAME_PREFIX,
'AWS_SSH_PEM_KEY_NAME': env.AWS_SSH_PEM_KEY_NAME
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configure Tofu Variables substitutes AWS_SSH_PEM_KEY_NAME, but this pipeline never loads/provides AWS_SSH_PEM_KEY_NAME (it’s not in useWithProperties and not a parameter). If TERRAFORM_CONFIG references ${AWS_SSH_PEM_KEY_NAME}, substitution may yield an incorrect tfvars and break destroy; either load the property/parameter or drop it from the substitution map.

Suggested change
'HOSTNAME_PREFIX': env.HOSTNAME_PREFIX,
'AWS_SSH_PEM_KEY_NAME': env.AWS_SSH_PEM_KEY_NAME
'HOSTNAME_PREFIX': env.HOSTNAME_PREFIX

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +201
rancher_bootstrap_password: "rancherrocks"
rancher_admin_password: "rancherrocks"
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANSIBLE_VARIABLES defaults hard-code the Rancher bootstrap/admin password (rancherrocks). Even for ephemeral environments this is a weak, well-known credential and is stored in cleartext in the Jenkinsfile/job config. Prefer sourcing these from Jenkins credentials (or at least a masked parameter) and generating per-run values if possible.

Suggested change
rancher_bootstrap_password: "rancherrocks"
rancher_admin_password: "rancherrocks"
rancher_bootstrap_password: "${RANCHER_BOOTSTRAP_PASSWORD}"
rancher_admin_password: "${RANCHER_ADMIN_PASSWORD}"

Copilot uses AI. Check for mistakes.
- Use env instead of params for library branch resolution
- Change PRIVATE_REGISTRY_PASSWORD to password type parameter
- Template rancher bootstrap/admin passwords from parameters
- Template enable_private_registry and deploy_rancher from params
- Use env.INFRA_DIR consistently in make.runTarget and archiveArtifacts
- Remove fingerprint from archiveArtifacts for token file
- Add AWS_SSH_PEM_KEY_NAME to cleanup useWithProperties block
- Remove unused AWS_SSH_PEM_KEY_NAME from destroy substitution map
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team/pit-crew slack notifier for pit crew

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants