-
Notifications
You must be signed in to change notification settings - Fork 29
RFC: Jenkins Pipeline Refactor — Complete Phase Plan #586
Description
Summary
This RFC maps the complete phased implementation plan for the Jenkins pipeline refactoring described in #585. Each phase is an independent branch/PR that can be reviewed and merged in batches.
Related: #585 (PRD)
Phase Overview
| Phase | Focus | Files Affected | Est. Lines Removed | Dependencies |
|---|---|---|---|---|
| 0 | Shared library foundation | New: vars/ + qa-jenkins-library PR |
0 (additive) | None |
| 1 | Airgap pipeline consolidation | 3 → 2 Jenkinsfiles | ~300 | Phase 0 |
| 2 | Simple test runner consolidation | 4 → 1 Jenkinsfile | ~350 | Phase 0 |
| 3 | Recurring/upgrade.e2e consolidation | 2 → 1 Jenkinsfile | ~150 | Phase 0 |
| 4 | Release upgrade pipeline consolidation | 2 → 1 Jenkinsfile | ~500 | Phase 0 |
| 5 | Elemental pipeline consolidation | 2 → 1 Jenkinsfile | ~100 | Phase 0 |
| 6 | TFP pipeline consolidation | 2 → 1 Jenkinsfile | ~60 | Phase 0 |
| 7 | Neuvector consolidation | 8 → 1 Jenkinsfile | ~70 | Phase 0 |
| 8 | Standalone pipeline refactor | 4 unique Jenkinsfiles | ~200 | Phase 0 |
| 9 | Shell script refactor | 17 scripts in validation/pipeline/scripts/ |
~150 | Phase 0 |
| 10 | Cleanup and documentation | Deprecation, README | ~400 | Phases 1-8 |
| Total | 37 → ~15 Jenkinsfiles | ~2,280 |
Phase 0: Shared Library Foundation
Goal: Create the reusable abstractions that all subsequent phases consume.
Branch strategy: Two parallel branches — one for qa-jenkins-library (external PR), one for rancher/tests (local vars/).
0a. qa-jenkins-library additions (external PR)
New functions to add to the qa-jenkins-library shared library:
| Function | Purpose | Parameters |
|---|---|---|
airgap.standardCheckout |
Clone both tests and qa-infra-automation repos with parameterized branches | testsRepo, testsBranch, qaInfraRepo, qaInfraBranch |
airgap.teardownInfrastructure |
Tofu select workspace + destroy + delete workspace as a unit | tofuModulePath, workspaceName, backendInitScript |
s3.uploadArtifact |
Upload a file to S3 using the standard Docker+awscli pattern | workspaceName, localPath, s3Key, awsCredentials |
s3.downloadArtifact |
Download a file from S3 using the standard Docker+awscli pattern | workspaceName, s3Key, localPath, awsCredentials |
s3.deleteArtifact |
Delete a file from S3 | workspaceName, s3Key, awsCredentials |
These are generic infrastructure primitives — not airgap-specific. Any pipeline that clones repos, manages tofu state, or passes artifacts via S3 can use them.
0b. Local vars/ directory (in rancher-tests)
New files in vars/:
| File | Purpose | Consumes |
|---|---|---|
resolvePipelineParams.groovy |
Parse job name, resolve BRANCH/REPO/TIMEOUT defaults | None |
standardDockerCleanup.groovy |
Docker stop/rm/rmi/volume rm sequence | None |
airgapInfraPipeline.groovy |
Shared parameters, credentials, path constants, checkout, tofu lifecycle, S3 artifact management for airgap infra | qa-jenkins-library: tofu.*, property.*, infrastructure.*, ansible.*, airgap.*, s3.* |
airgapTestPipeline.groovy |
Extends infra with Go test params, gotestsum invocation, Qase reporting | airgapInfraPipeline, qa-jenkins-library: container.*, generate.* |
simpleTestPipeline.groovy |
Shared parameters and stage flow for simple test runners | None |
recurringTestPipeline.groovy |
Shared parameters and stage flow for recurring/config-driven pipelines | qa-jenkins-library functions |
standardCredentialLoader.groovy |
Load the appropriate credential set based on environment/target | None |
Verification: Each function is tested by the pipelines that consume it (live pipeline testing). No standalone unit tests for Groovy functions.
Merge criteria: Both 0a and 0b can be merged independently. Phase 1+ requires both to be available.
Phase 1: Airgap Pipeline Consolidation
Goal: Replace 3 airgap Jenkinsfiles with 2 refactored ones using Declarative Pipeline syntax.
Files Removed
validation/pipeline/Jenkinsfile.setup.airgap.rke2(299 lines)validation/pipeline/Jenkinsfile.destroy.airgap.rke2(157 lines)validation/pipeline/Jenkinsfile.airgap.go-tests(405 lines)validation/pipeline/tfp/Jenkinsfile.airgap.tests(186 lines, deprecated)
Files Created
validation/pipeline/Jenkinsfile.airgap-rke2-infra— Setup + Destroy with ACTION parametervalidation/pipeline/Jenkinsfile.airgap-rke2-tests— Full test lifecycle
Jenkinsfile.airgap-rke2-infra Design
pipeline {
parameters {
choice(name: 'ACTION', choices: ['setup', 'destroy'], description: 'Infrastructure action')
string(name: 'DEPLOY_RANCHER', defaultValue: 'true', description: 'Deploy Rancher after K8s setup')
string(name: 'DESTROY_ON_FAILURE', defaultValue: 'true')
string(name: 'TARGET_WORKSPACE', description: 'Workspace name (required for destroy)')
// ... harmonized parameters
}
stages {
stage('Checkout') { steps { script { airgapInfraPipeline.checkout(params) } } }
stage('Build Infrastructure Image') { ... }
// Setup-specific stages (when { expression { params.ACTION == 'setup' } })
stage('Setup Infrastructure') {
when { expression { params.ACTION == 'setup' } }
steps { script { airgapInfraPipeline.setup(params) } }
}
stage('Deploy RKE2') {
when { expression { params.ACTION == 'setup' } }
steps { script { airgapInfraPipeline.deployRKE2(params) } }
}
stage('Deploy Rancher') {
when {
allOf {
expression { params.ACTION == 'setup' }
expression { params.DEPLOY_RANCHER == 'true' }
}
}
steps { script { airgapInfraPipeline.deployRancher(params) } }
}
// Destroy-specific stages (when { expression { params.ACTION == 'destroy' } })
stage('Destroy Infrastructure') {
when { expression { params.ACTION == 'destroy' } }
steps { script { airgapInfraPipeline.destroy(params) } }
}
}
post {
failure {
script { if (params.DESTROY_ON_FAILURE == 'true') airgapInfraPipeline.teardown(params) }
}
}
}
Jenkinsfile.airgap-rke2-tests Design
Retains the full lifecycle (setup → deploy → test → teardown) but delegates infra operations to airgapInfraPipeline and test operations to airgapTestPipeline.
Verification Checklist
- ACTION=setup produces identical AWS resources as original
Jenkinsfile.setup.airgap.rke2 - ACTION=destroy cleanly removes all resources as original
Jenkinsfile.destroy.airgap.rke2 - S3 tfvars upload/download works across setup→destroy cycle
- DEPLOY_RANCHER=false skips Rancher helm deployment
- Test pipeline runs same test packages and archives same artifacts as original
Jenkinsfile.airgap.go-tests - DESTROY_ON_FAILURE triggers teardown on failed setup
- DESTROY_AFTER_TESTS triggers teardown after successful test run
Migration Plan
- Create new Jenkinsfiles alongside originals
- Create new Jenkins jobs pointing to new Jenkinsfiles
- Run 2+ successful verification cycles
- Update production job configurations
- Delete original Jenkinsfiles
Phase 2: Simple Test Runner Consolidation
Goal: Replace 4 nearly-identical test runners with a single parameterized Declarative pipeline.
Files Removed
validation/Jenkinsfile(189 lines)validation/Jenkinsfile.e2e(124 lines)validation/Jenkinsfile.harvester(158 lines)validation/Jenkinsfile.vsphere(148 lines)
Files Created
validation/Jenkinsfile.validation— Single parameterized pipeline
Design
pipeline {
agent { label params.NODE_LABEL ?: '' }
parameters {
choice(name: 'NODE_LABEL', choices: ['', 'harvester-vpn-1', 'vsphere-vpn-1'],
description: 'Target node (empty = unallocated)')
string(name: 'TEST_PACKAGE', ...)
string(name: 'CONFIG', ...)
// ... standard parameters
}
stages {
stage('Checkout') { steps { script { simpleTestPipeline.checkout(params) } } }
stage('Configure and Build') { steps { script { simpleTestPipeline.configureAndBuild(params) } } }
stage('Run Validation Tests') { steps { script { simpleTestPipeline.runTests(params) } } }
stage('Test Report') { steps { junit '**/junit-report.xml' } } }
}
post {
always { script { standardDockerCleanup(...) } }
}
}
The credential set is loaded dynamically based on NODE_LABEL via standardCredentialLoader.
Verification
- NODE_LABEL='' produces same results as original
Jenkinsfile - NODE_LABEL='harvester-vpn-1' produces same results as original
Jenkinsfile.harvester - NODE_LABEL='vsphere-vpn-1' produces same results as original
Jenkinsfile.vsphere - Same as
Jenkinsfile.e2ewith default parameters
Phase 3: Recurring/Upgrade.e2e Consolidation
Goal: Consolidate the recurring and upgrade.e2e pipelines that share the config-driven infrastructure pattern.
Files Removed
validation/pipeline/Jenkinsfile.recurring(273 lines)validation/Jenkinsfile.upgrade.e2e(210 lines)
Files Created
validation/pipeline/Jenkinsfile.recurring— Unified config-driven pipeline with optional upgrade stages
Shared Pattern
Both pipelines clone qa-infra-automation, use cattle-configs/patched-cattle-configs, config generator containers, and rancher_cleanup.sh. They differ in that upgrade.e2e adds upgrade stages. The unified pipeline uses a RUN_UPGRADE boolean parameter.
New vars/ Functions
recurringTestPipeline.groovy— shared parameters and stage flow for config-driven pipelines
Verification
- RUN_UPGRADE=false matches original
Jenkinsfile.recurringbehavior - RUN_UPGRADE=true matches original
Jenkinsfile.upgrade.e2ebehavior - Qase reporting works in both modes
- Infrastructure cleanup runs correctly in both modes
Phase 4: Release Upgrade Pipeline Consolidation
Goal: Consolidate the two most complex pipelines (1,330 lines combined) that share ~400 lines of identical upgrade testing logic.
Files Removed
validation/pipeline/Jenkinsfile.release.upgrade.ha(739 lines)validation/pipeline/Jenkinsfile.release.upgrade.local(591 lines)
Files Created
validation/pipeline/Jenkinsfile.release.upgrade— Unified upgrade pipeline with DEPLOYMENT_TYPE parameter
Design
DEPLOYMENT_TYPEparameter:haorlocal- Conditionally includes HA-specific stages (parallel provisioning, HA-specific cleanup) or local-specific stages (single-node setup)
- Shared stages: pre/post upgrade checks, config generation, test validation, cleanup
This is the highest-risk phase because:
- These are the largest files in the repo (739 + 591 = 1,330 lines)
- They have complex parallel provisioning, multi-version iteration, and cattle-configs patching
- Changes here affect release testing workflows
Verification
- DEPLOYMENT_TYPE='ha' matches original
Jenkinsfile.release.upgrade.hathrough full upgrade cycle - DEPLOYMENT_TYPE='local' matches original
Jenkinsfile.release.upgrade.localthrough full upgrade cycle - Pre/post upgrade checks run correctly for both types
- Parallel provisioning works for both types
Phase 5: Elemental Pipeline Consolidation
Goal: Consolidate the two nearly-identical Elemental pipelines.
Files Removed
validation/pipeline/qainfra/Jenkinsfile.elemental.e2e(193 lines)validation/pipeline/qainfra/Jenkinsfile.elemental.harvester.e2e(192 lines)
Files Created
validation/pipeline/qainfra/Jenkinsfile.elemental.e2e— Unified Elemental pipeline with HARVESTER parameter
Design
HARVESTERboolean parameter controls whether Harvester-specific stages and cleanup are included- Node label dynamically set:
harvester-vpn-1when HARVESTER=true, unallocated otherwise - Destroy script selected based on HARVESTER flag
Verification
- HARVESTER=false matches original
Jenkinsfile.elemental.e2e - HARVESTER=true matches original
Jenkinsfile.elemental.harvester.e2e - Correct destroy scripts run for each mode
Phase 6: TFP Pipeline Consolidation
Goal: Consolidate the TFP setup/delete pair.
Files Removed
validation/pipeline/tfp/Jenkinsfile.tfp.setup(126 lines)validation/pipeline/tfp/Jenkinsfile.tfp.setup.delete(108 lines)
Files Created
validation/pipeline/tfp/Jenkinsfile.tfp— Unified TFP pipeline with ACTION parameter
Design
- Same ACTION=setup/destroy pattern as the airgap infra pipeline
- S3 artifact upload during setup, download during destroy
- Shares the
s3.*functions from qa-jenkins-library
Verification
- ACTION=setup matches original
Jenkinsfile.tfp.setup - ACTION=destroy matches original
Jenkinsfile.tfp.setup.delete - S3 tf file upload/download works across setup→destroy
Phase 7: Neuvector Consolidation
Goal: Collapse 8 trivially-identical Neuvector Jenkinsfiles into one parameterized pipeline.
Files Removed
validation/neuvector/pipeline/allinone-pass/Jenkinsfile.nv_allinone_auto_01through_07(7 files, 70 lines)validation/neuvector/pipeline/allinone-pass/Jenkinsfile.Neuvector_allinone_pass_regression(10 lines)
Files Created
validation/neuvector/pipeline/allinone-pass/Jenkinsfile.neuvector— Single parameterized pipeline
Design
SERVER_CREDENTIAL_IDparameter selects which Neuvector server to targetTEST_SCRIPTparameter selects the script to run (default:allinone_script.sh)- Single stage: SSH to target server, execute script
Verification
- Can target all 7 servers via SERVER_CREDENTIAL_ID parameter
- Regression test mode works
Phase 8: Standalone Pipeline Refactor
Goal: Apply shared library patterns to the 4 unique pipelines without consolidation.
Each pipeline gets its own micro-phase. The refactor applies shared library functions for boilerplate (checkout, cleanup, parameter resolution) while preserving the unique pipeline logic.
8a. validation/Jenkinsfile.rc (378 lines)
- Apply
resolvePipelineParams,standardDockerCleanup, Declarative syntax - Preserve: corral packages checkout, multi-SCM, config generator container, cattle-configs patching
- Convert to Declarative Pipeline with
script { }blocks for complex logic
8b. validation/pipeline/tfp/Jenkinsfile.harvester.e2e (518 lines)
- Apply
resolvePipelineParams,standardDockerCleanup, Declarative syntax - Preserve: bare-metal provisioning, seeder setup, Harvester install, infra-repo symlink pattern
- This is the most unique pipeline and may have the least shared logic
8c. validation/Jenkinsfile_pre_post_upgrade (181 lines)
- Apply
simpleTestPipelinefunctions for shared stages - Preserve: dual gotestsum run pattern (main test + upgrade test)
8d. validation/Jenkinsfile_support_matrix_os (249 lines)
- Apply
resolvePipelineParams, Declarative syntax - Preserve: job dispatcher pattern, parameter construction for downstream jobs
- Note: This and
Jenkinsfile.multibranch.recurringcould potentially be consolidated as "job dispatchers" but they have different parameter structures
8e. scripts/custodian/Jenkinsfile (134 lines)
- Minimal refactor: apply Declarative syntax,
standardDockerCleanup - Preserve: multi-cloud credential injection, custodian YAML execution
- This pipeline is completely unrelated to test execution
Phase 9: Shell Script Refactor
Goal: Refactor the 17 shell scripts in validation/pipeline/scripts/ to use shared functions, standardize patterns, and replace inline tofu/ansible calls with qa-jenkins-library function calls where possible.
Current Issues
- Inconsistent error handling: some use
set -e, someset -ex, someset -euo, some none - Unquoted variables:
$ELEMENTAL_TFVARS_FILEwithout quotes - Hardcoded paths:
/root/go/src/github.com/rancher/qa-infra-automation - Duplicated tofu/ansible invocation patterns
- No shared utility library
Approach
9a. Create shared shell library
New file: validation/pipeline/scripts/lib/common.sh
Provides:
tofu_cmd()— wrapper for tofu with-chdirpattern, alias for terraform compatibilityansible_run()— wrapper for ansible-playbook with standard retry and error handlinggo_cross_compile()— standardenv GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go buildcleanup_on_failure()— trap-based cleanup that runstofu destroyifCLEANUP=trueresolve_qa_infra_path()— resolvesQAINFRA_SCRIPT_PATHwith fallbacks3_upload()/s3_download()— shell-level S3 operations (for scripts called from older pipelines)
9b. Refactor infrastructure scripts
build_qa_infra.sh— Usetofu_cmd(),ansible_run()with retry,cleanup_on_failure(), quote all variablesdestroy_qa_infra.sh— Usetofu_cmd(), add proper error handlingupgrade_qa_infra.sh— Useansible_run()register_downstream_cluster.sh— Usetofu_cmd(), quote variablesbuild_elemental_qa_infra.sh— Usetofu_cmd(),ansible_run(),cleanup_on_failure()build_elemental_harvester_qa_infra.sh— Usetofu_cmd(),ansible_run(),cleanup_on_failure()destroy_elemental_qa_infra.sh— Usetofu_cmd()destroy_elemental_harvester_qa_infra.sh— Usetofu_cmd()
9c. Refactor utility scripts
- Standardize
set -euo pipefailacross all scripts - Replace hardcoded paths with
resolve_qa_infra_path() - Replace inline
env GOOS=linux GOARCH=amd64withgo_cross_compile()
Verification
- All refactored scripts pass shellcheck
-
build_qa_infra.shproduces identical infrastructure output -
destroy_qa_infra.shcleanly removes all resources - Elemental scripts work for both GCP and Harvester paths
- Qase reporter scripts build correctly
Phase 10: Cleanup and Documentation
Goal: Remove deprecated files, add documentation, and establish patterns for future pipeline development.
Tasks
10a. Delete deprecated files
- Remove all original Jenkinsfiles that have been replaced (moved to
deprecated/first during migration, deleted after verification period) - Remove
tfp/Jenkinsfile.airgap.tests(deprecated in Phase 1)
10b. Create pipeline documentation
validation/pipeline/README.md— Overview of the shared library structure, how to create a new pipeline, naming conventions, parameter standards- Update
validation/pipeline/scripts/README.md— Document the shared shell library and script conventions - Add
@paramdocumentation to allvars/*.groovyfunctions
10c. Establish contribution guidelines
- Document how to add a new shared function to
vars/ - Document how to propose additions to
qa-jenkins-library - Document the naming convention:
Jenkinsfile.<category>-<variant>(e.g.,airgap-rke2-infra,validation,elemental-e2e) - Document parameter naming conventions
10d. Final inventory
Produce a summary of the final state:
- Before: 37 Jenkinsfiles, 5,719 lines, 0 shared abstractions
- After: ~15 Jenkinsfiles, ~3,400 lines, 7+ shared functions in
vars/, 5+ new functions inqa-jenkins-library
Dependency Graph
Phase 0a (qa-jenkins-library PR) ──────────────────────┐
Phase 0b (local vars/) ────────────────────────────────┤
│
Phase 1 (Airgap) ──────────────────────────────────────┤
Phase 2 (Simple test runners) ─────────────────────────┤ ← Can run
Phase 3 (Recurring/upgrade.e2e) ───────────────────────┤ in parallel
Phase 4 (Release upgrade HA/Local) ────────────────────┤ after Phase 0
Phase 5 (Elemental) ───────────────────────────────────┤
Phase 6 (TFP) ─────────────────────────────────────────┤
Phase 7 (Neuvector) ───────────────────────────────────┤
Phase 8 (Standalone micro-phases) ─────────────────────┤
Phase 9 (Shell scripts) ───────────────────────────────┤
│
Phase 10 (Cleanup & docs) ← depends on Phases 1-9 ────┘
Phases 1-9 can proceed as independent branches once Phase 0 is merged. They can be reviewed and merged in batches. Phase 10 is the final cleanup after all migrations are verified.
Risk Matrix
| Phase | Risk Level | Why | Mitigation |
|---|---|---|---|
| 0 | Low | Additive only, no existing behavior changes | N/A |
| 1 | Medium | Airgap pipelines are actively used for QA | Parallel coexistence, 2+ verification runs |
| 2 | Low | Simple pipelines, easy to verify | Quick rollback by switching Jenkins job config |
| 3 | Medium | Recurring pipelines run on schedule | Monitor first automated run after migration |
| 4 | High | Largest files, most complex logic, release testing | Extended verification period, pair review |
| 5 | Low | Near-identical files, simple consolidation | Quick rollback |
| 6 | Low | Paired setup/delete, well-understood pattern | Same pattern as Phase 1 |
| 7 | Very Low | Trivially simple files, no infrastructure | Can be done in an afternoon |
| 8 | Medium | Unique pipelines require careful preservation of logic | One file at a time, verify each independently |
| 9 | Medium | Shell scripts called by multiple pipeline types | Test each script individually |
| 10 | Low | Cleanup and documentation only | N/A |
Success Metrics
- Pipeline creation time: New pipeline variant takes < 30 minutes (currently days)
- Line count reduction: ~40% reduction across all Jenkinsfiles (5,719 → ~3,400)
- File count reduction: 37 → ~15 Jenkinsfiles (60% reduction)
- Shared abstractions: 0 → 12+ reusable functions (7 local, 5+ in qa-jenkins-library)
- Consistent syntax: All pipelines use Declarative Pipeline syntax
- Harmonized parameters: Same parameter names for same concepts across all pipelines
- Standardized scripts: All shell scripts use
set -euo pipefail, shared library, quoted variables