Skip to content

RFC: Jenkins Pipeline Refactor — Complete Phase Plan #586

@floatingman

Description

@floatingman

Summary

This RFC maps the complete phased implementation plan for the Jenkins pipeline refactoring described in #585. Each phase is an independent branch/PR that can be reviewed and merged in batches.

Related: #585 (PRD)

Phase Overview

Phase Focus Files Affected Est. Lines Removed Dependencies
0 Shared library foundation New: vars/ + qa-jenkins-library PR 0 (additive) None
1 Airgap pipeline consolidation 3 → 2 Jenkinsfiles ~300 Phase 0
2 Simple test runner consolidation 4 → 1 Jenkinsfile ~350 Phase 0
3 Recurring/upgrade.e2e consolidation 2 → 1 Jenkinsfile ~150 Phase 0
4 Release upgrade pipeline consolidation 2 → 1 Jenkinsfile ~500 Phase 0
5 Elemental pipeline consolidation 2 → 1 Jenkinsfile ~100 Phase 0
6 TFP pipeline consolidation 2 → 1 Jenkinsfile ~60 Phase 0
7 Neuvector consolidation 8 → 1 Jenkinsfile ~70 Phase 0
8 Standalone pipeline refactor 4 unique Jenkinsfiles ~200 Phase 0
9 Shell script refactor 17 scripts in validation/pipeline/scripts/ ~150 Phase 0
10 Cleanup and documentation Deprecation, README ~400 Phases 1-8
Total 37 → ~15 Jenkinsfiles ~2,280

Phase 0: Shared Library Foundation

Goal: Create the reusable abstractions that all subsequent phases consume.

Branch strategy: Two parallel branches — one for qa-jenkins-library (external PR), one for rancher/tests (local vars/).

0a. qa-jenkins-library additions (external PR)

New functions to add to the qa-jenkins-library shared library:

Function Purpose Parameters
airgap.standardCheckout Clone both tests and qa-infra-automation repos with parameterized branches testsRepo, testsBranch, qaInfraRepo, qaInfraBranch
airgap.teardownInfrastructure Tofu select workspace + destroy + delete workspace as a unit tofuModulePath, workspaceName, backendInitScript
s3.uploadArtifact Upload a file to S3 using the standard Docker+awscli pattern workspaceName, localPath, s3Key, awsCredentials
s3.downloadArtifact Download a file from S3 using the standard Docker+awscli pattern workspaceName, s3Key, localPath, awsCredentials
s3.deleteArtifact Delete a file from S3 workspaceName, s3Key, awsCredentials

These are generic infrastructure primitives — not airgap-specific. Any pipeline that clones repos, manages tofu state, or passes artifacts via S3 can use them.

0b. Local vars/ directory (in rancher-tests)

New files in vars/:

File Purpose Consumes
resolvePipelineParams.groovy Parse job name, resolve BRANCH/REPO/TIMEOUT defaults None
standardDockerCleanup.groovy Docker stop/rm/rmi/volume rm sequence None
airgapInfraPipeline.groovy Shared parameters, credentials, path constants, checkout, tofu lifecycle, S3 artifact management for airgap infra qa-jenkins-library: tofu.*, property.*, infrastructure.*, ansible.*, airgap.*, s3.*
airgapTestPipeline.groovy Extends infra with Go test params, gotestsum invocation, Qase reporting airgapInfraPipeline, qa-jenkins-library: container.*, generate.*
simpleTestPipeline.groovy Shared parameters and stage flow for simple test runners None
recurringTestPipeline.groovy Shared parameters and stage flow for recurring/config-driven pipelines qa-jenkins-library functions
standardCredentialLoader.groovy Load the appropriate credential set based on environment/target None

Verification: Each function is tested by the pipelines that consume it (live pipeline testing). No standalone unit tests for Groovy functions.

Merge criteria: Both 0a and 0b can be merged independently. Phase 1+ requires both to be available.


Phase 1: Airgap Pipeline Consolidation

Goal: Replace 3 airgap Jenkinsfiles with 2 refactored ones using Declarative Pipeline syntax.

Files Removed

  • validation/pipeline/Jenkinsfile.setup.airgap.rke2 (299 lines)
  • validation/pipeline/Jenkinsfile.destroy.airgap.rke2 (157 lines)
  • validation/pipeline/Jenkinsfile.airgap.go-tests (405 lines)
  • validation/pipeline/tfp/Jenkinsfile.airgap.tests (186 lines, deprecated)

Files Created

  • validation/pipeline/Jenkinsfile.airgap-rke2-infra — Setup + Destroy with ACTION parameter
  • validation/pipeline/Jenkinsfile.airgap-rke2-tests — Full test lifecycle

Jenkinsfile.airgap-rke2-infra Design

pipeline {
    parameters {
        choice(name: 'ACTION', choices: ['setup', 'destroy'], description: 'Infrastructure action')
        string(name: 'DEPLOY_RANCHER', defaultValue: 'true', description: 'Deploy Rancher after K8s setup')
        string(name: 'DESTROY_ON_FAILURE', defaultValue: 'true')
        string(name: 'TARGET_WORKSPACE', description: 'Workspace name (required for destroy)')
        // ... harmonized parameters
    }
    
    stages {
        stage('Checkout') { steps { script { airgapInfraPipeline.checkout(params) } } }
        stage('Build Infrastructure Image') { ... }
        
        // Setup-specific stages (when { expression { params.ACTION == 'setup' } })
        stage('Setup Infrastructure') {
            when { expression { params.ACTION == 'setup' } }
            steps { script { airgapInfraPipeline.setup(params) } }
        }
        stage('Deploy RKE2') {
            when { expression { params.ACTION == 'setup' } }
            steps { script { airgapInfraPipeline.deployRKE2(params) } }
        }
        stage('Deploy Rancher') {
            when { 
                allOf {
                    expression { params.ACTION == 'setup' }
                    expression { params.DEPLOY_RANCHER == 'true' }
                }
            }
            steps { script { airgapInfraPipeline.deployRancher(params) } }
        }
        
        // Destroy-specific stages (when { expression { params.ACTION == 'destroy' } })
        stage('Destroy Infrastructure') {
            when { expression { params.ACTION == 'destroy' } }
            steps { script { airgapInfraPipeline.destroy(params) } }
        }
    }
    
    post {
        failure {
            script { if (params.DESTROY_ON_FAILURE == 'true') airgapInfraPipeline.teardown(params) }
        }
    }
}

Jenkinsfile.airgap-rke2-tests Design

Retains the full lifecycle (setup → deploy → test → teardown) but delegates infra operations to airgapInfraPipeline and test operations to airgapTestPipeline.

Verification Checklist

  • ACTION=setup produces identical AWS resources as original Jenkinsfile.setup.airgap.rke2
  • ACTION=destroy cleanly removes all resources as original Jenkinsfile.destroy.airgap.rke2
  • S3 tfvars upload/download works across setup→destroy cycle
  • DEPLOY_RANCHER=false skips Rancher helm deployment
  • Test pipeline runs same test packages and archives same artifacts as original Jenkinsfile.airgap.go-tests
  • DESTROY_ON_FAILURE triggers teardown on failed setup
  • DESTROY_AFTER_TESTS triggers teardown after successful test run

Migration Plan

  1. Create new Jenkinsfiles alongside originals
  2. Create new Jenkins jobs pointing to new Jenkinsfiles
  3. Run 2+ successful verification cycles
  4. Update production job configurations
  5. Delete original Jenkinsfiles

Phase 2: Simple Test Runner Consolidation

Goal: Replace 4 nearly-identical test runners with a single parameterized Declarative pipeline.

Files Removed

  • validation/Jenkinsfile (189 lines)
  • validation/Jenkinsfile.e2e (124 lines)
  • validation/Jenkinsfile.harvester (158 lines)
  • validation/Jenkinsfile.vsphere (148 lines)

Files Created

  • validation/Jenkinsfile.validation — Single parameterized pipeline

Design

pipeline {
    agent { label params.NODE_LABEL ?: '' }
    parameters {
        choice(name: 'NODE_LABEL', choices: ['', 'harvester-vpn-1', 'vsphere-vpn-1'], 
               description: 'Target node (empty = unallocated)')
        string(name: 'TEST_PACKAGE', ...)
        string(name: 'CONFIG', ...)
        // ... standard parameters
    }
    
    stages {
        stage('Checkout') { steps { script { simpleTestPipeline.checkout(params) } } }
        stage('Configure and Build') { steps { script { simpleTestPipeline.configureAndBuild(params) } } }
        stage('Run Validation Tests') { steps { script { simpleTestPipeline.runTests(params) } } }
        stage('Test Report') { steps { junit '**/junit-report.xml' } } }
    }
    
    post {
        always { script { standardDockerCleanup(...) } }
    }
}

The credential set is loaded dynamically based on NODE_LABEL via standardCredentialLoader.

Verification

  • NODE_LABEL='' produces same results as original Jenkinsfile
  • NODE_LABEL='harvester-vpn-1' produces same results as original Jenkinsfile.harvester
  • NODE_LABEL='vsphere-vpn-1' produces same results as original Jenkinsfile.vsphere
  • Same as Jenkinsfile.e2e with default parameters

Phase 3: Recurring/Upgrade.e2e Consolidation

Goal: Consolidate the recurring and upgrade.e2e pipelines that share the config-driven infrastructure pattern.

Files Removed

  • validation/pipeline/Jenkinsfile.recurring (273 lines)
  • validation/Jenkinsfile.upgrade.e2e (210 lines)

Files Created

  • validation/pipeline/Jenkinsfile.recurring — Unified config-driven pipeline with optional upgrade stages

Shared Pattern

Both pipelines clone qa-infra-automation, use cattle-configs/patched-cattle-configs, config generator containers, and rancher_cleanup.sh. They differ in that upgrade.e2e adds upgrade stages. The unified pipeline uses a RUN_UPGRADE boolean parameter.

New vars/ Functions

  • recurringTestPipeline.groovy — shared parameters and stage flow for config-driven pipelines

Verification

  • RUN_UPGRADE=false matches original Jenkinsfile.recurring behavior
  • RUN_UPGRADE=true matches original Jenkinsfile.upgrade.e2e behavior
  • Qase reporting works in both modes
  • Infrastructure cleanup runs correctly in both modes

Phase 4: Release Upgrade Pipeline Consolidation

Goal: Consolidate the two most complex pipelines (1,330 lines combined) that share ~400 lines of identical upgrade testing logic.

Files Removed

  • validation/pipeline/Jenkinsfile.release.upgrade.ha (739 lines)
  • validation/pipeline/Jenkinsfile.release.upgrade.local (591 lines)

Files Created

  • validation/pipeline/Jenkinsfile.release.upgrade — Unified upgrade pipeline with DEPLOYMENT_TYPE parameter

Design

  • DEPLOYMENT_TYPE parameter: ha or local
  • Conditionally includes HA-specific stages (parallel provisioning, HA-specific cleanup) or local-specific stages (single-node setup)
  • Shared stages: pre/post upgrade checks, config generation, test validation, cleanup

This is the highest-risk phase because:

  • These are the largest files in the repo (739 + 591 = 1,330 lines)
  • They have complex parallel provisioning, multi-version iteration, and cattle-configs patching
  • Changes here affect release testing workflows

Verification

  • DEPLOYMENT_TYPE='ha' matches original Jenkinsfile.release.upgrade.ha through full upgrade cycle
  • DEPLOYMENT_TYPE='local' matches original Jenkinsfile.release.upgrade.local through full upgrade cycle
  • Pre/post upgrade checks run correctly for both types
  • Parallel provisioning works for both types

Phase 5: Elemental Pipeline Consolidation

Goal: Consolidate the two nearly-identical Elemental pipelines.

Files Removed

  • validation/pipeline/qainfra/Jenkinsfile.elemental.e2e (193 lines)
  • validation/pipeline/qainfra/Jenkinsfile.elemental.harvester.e2e (192 lines)

Files Created

  • validation/pipeline/qainfra/Jenkinsfile.elemental.e2e — Unified Elemental pipeline with HARVESTER parameter

Design

  • HARVESTER boolean parameter controls whether Harvester-specific stages and cleanup are included
  • Node label dynamically set: harvester-vpn-1 when HARVESTER=true, unallocated otherwise
  • Destroy script selected based on HARVESTER flag

Verification

  • HARVESTER=false matches original Jenkinsfile.elemental.e2e
  • HARVESTER=true matches original Jenkinsfile.elemental.harvester.e2e
  • Correct destroy scripts run for each mode

Phase 6: TFP Pipeline Consolidation

Goal: Consolidate the TFP setup/delete pair.

Files Removed

  • validation/pipeline/tfp/Jenkinsfile.tfp.setup (126 lines)
  • validation/pipeline/tfp/Jenkinsfile.tfp.setup.delete (108 lines)

Files Created

  • validation/pipeline/tfp/Jenkinsfile.tfp — Unified TFP pipeline with ACTION parameter

Design

  • Same ACTION=setup/destroy pattern as the airgap infra pipeline
  • S3 artifact upload during setup, download during destroy
  • Shares the s3.* functions from qa-jenkins-library

Verification

  • ACTION=setup matches original Jenkinsfile.tfp.setup
  • ACTION=destroy matches original Jenkinsfile.tfp.setup.delete
  • S3 tf file upload/download works across setup→destroy

Phase 7: Neuvector Consolidation

Goal: Collapse 8 trivially-identical Neuvector Jenkinsfiles into one parameterized pipeline.

Files Removed

  • validation/neuvector/pipeline/allinone-pass/Jenkinsfile.nv_allinone_auto_01 through _07 (7 files, 70 lines)
  • validation/neuvector/pipeline/allinone-pass/Jenkinsfile.Neuvector_allinone_pass_regression (10 lines)

Files Created

  • validation/neuvector/pipeline/allinone-pass/Jenkinsfile.neuvector — Single parameterized pipeline

Design

  • SERVER_CREDENTIAL_ID parameter selects which Neuvector server to target
  • TEST_SCRIPT parameter selects the script to run (default: allinone_script.sh)
  • Single stage: SSH to target server, execute script

Verification

  • Can target all 7 servers via SERVER_CREDENTIAL_ID parameter
  • Regression test mode works

Phase 8: Standalone Pipeline Refactor

Goal: Apply shared library patterns to the 4 unique pipelines without consolidation.

Each pipeline gets its own micro-phase. The refactor applies shared library functions for boilerplate (checkout, cleanup, parameter resolution) while preserving the unique pipeline logic.

8a. validation/Jenkinsfile.rc (378 lines)

  • Apply resolvePipelineParams, standardDockerCleanup, Declarative syntax
  • Preserve: corral packages checkout, multi-SCM, config generator container, cattle-configs patching
  • Convert to Declarative Pipeline with script { } blocks for complex logic

8b. validation/pipeline/tfp/Jenkinsfile.harvester.e2e (518 lines)

  • Apply resolvePipelineParams, standardDockerCleanup, Declarative syntax
  • Preserve: bare-metal provisioning, seeder setup, Harvester install, infra-repo symlink pattern
  • This is the most unique pipeline and may have the least shared logic

8c. validation/Jenkinsfile_pre_post_upgrade (181 lines)

  • Apply simpleTestPipeline functions for shared stages
  • Preserve: dual gotestsum run pattern (main test + upgrade test)

8d. validation/Jenkinsfile_support_matrix_os (249 lines)

  • Apply resolvePipelineParams, Declarative syntax
  • Preserve: job dispatcher pattern, parameter construction for downstream jobs
  • Note: This and Jenkinsfile.multibranch.recurring could potentially be consolidated as "job dispatchers" but they have different parameter structures

8e. scripts/custodian/Jenkinsfile (134 lines)

  • Minimal refactor: apply Declarative syntax, standardDockerCleanup
  • Preserve: multi-cloud credential injection, custodian YAML execution
  • This pipeline is completely unrelated to test execution

Phase 9: Shell Script Refactor

Goal: Refactor the 17 shell scripts in validation/pipeline/scripts/ to use shared functions, standardize patterns, and replace inline tofu/ansible calls with qa-jenkins-library function calls where possible.

Current Issues

  • Inconsistent error handling: some use set -e, some set -ex, some set -euo, some none
  • Unquoted variables: $ELEMENTAL_TFVARS_FILE without quotes
  • Hardcoded paths: /root/go/src/github.com/rancher/qa-infra-automation
  • Duplicated tofu/ansible invocation patterns
  • No shared utility library

Approach

9a. Create shared shell library

New file: validation/pipeline/scripts/lib/common.sh

Provides:

  • tofu_cmd() — wrapper for tofu with -chdir pattern, alias for terraform compatibility
  • ansible_run() — wrapper for ansible-playbook with standard retry and error handling
  • go_cross_compile() — standard env GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build
  • cleanup_on_failure() — trap-based cleanup that runs tofu destroy if CLEANUP=true
  • resolve_qa_infra_path() — resolves QAINFRA_SCRIPT_PATH with fallback
  • s3_upload() / s3_download() — shell-level S3 operations (for scripts called from older pipelines)

9b. Refactor infrastructure scripts

  • build_qa_infra.sh — Use tofu_cmd(), ansible_run() with retry, cleanup_on_failure(), quote all variables
  • destroy_qa_infra.sh — Use tofu_cmd(), add proper error handling
  • upgrade_qa_infra.sh — Use ansible_run()
  • register_downstream_cluster.sh — Use tofu_cmd(), quote variables
  • build_elemental_qa_infra.sh — Use tofu_cmd(), ansible_run(), cleanup_on_failure()
  • build_elemental_harvester_qa_infra.sh — Use tofu_cmd(), ansible_run(), cleanup_on_failure()
  • destroy_elemental_qa_infra.sh — Use tofu_cmd()
  • destroy_elemental_harvester_qa_infra.sh — Use tofu_cmd()

9c. Refactor utility scripts

  • Standardize set -euo pipefail across all scripts
  • Replace hardcoded paths with resolve_qa_infra_path()
  • Replace inline env GOOS=linux GOARCH=amd64 with go_cross_compile()

Verification

  • All refactored scripts pass shellcheck
  • build_qa_infra.sh produces identical infrastructure output
  • destroy_qa_infra.sh cleanly removes all resources
  • Elemental scripts work for both GCP and Harvester paths
  • Qase reporter scripts build correctly

Phase 10: Cleanup and Documentation

Goal: Remove deprecated files, add documentation, and establish patterns for future pipeline development.

Tasks

10a. Delete deprecated files

  • Remove all original Jenkinsfiles that have been replaced (moved to deprecated/ first during migration, deleted after verification period)
  • Remove tfp/Jenkinsfile.airgap.tests (deprecated in Phase 1)

10b. Create pipeline documentation

  • validation/pipeline/README.md — Overview of the shared library structure, how to create a new pipeline, naming conventions, parameter standards
  • Update validation/pipeline/scripts/README.md — Document the shared shell library and script conventions
  • Add @param documentation to all vars/*.groovy functions

10c. Establish contribution guidelines

  • Document how to add a new shared function to vars/
  • Document how to propose additions to qa-jenkins-library
  • Document the naming convention: Jenkinsfile.<category>-<variant> (e.g., airgap-rke2-infra, validation, elemental-e2e)
  • Document parameter naming conventions

10d. Final inventory

Produce a summary of the final state:

  • Before: 37 Jenkinsfiles, 5,719 lines, 0 shared abstractions
  • After: ~15 Jenkinsfiles, ~3,400 lines, 7+ shared functions in vars/, 5+ new functions in qa-jenkins-library

Dependency Graph

Phase 0a (qa-jenkins-library PR) ──────────────────────┐
Phase 0b (local vars/) ────────────────────────────────┤
                                                        │
Phase 1 (Airgap) ──────────────────────────────────────┤
Phase 2 (Simple test runners) ─────────────────────────┤  ← Can run
Phase 3 (Recurring/upgrade.e2e) ───────────────────────┤    in parallel
Phase 4 (Release upgrade HA/Local) ────────────────────┤    after Phase 0
Phase 5 (Elemental) ───────────────────────────────────┤
Phase 6 (TFP) ─────────────────────────────────────────┤
Phase 7 (Neuvector) ───────────────────────────────────┤
Phase 8 (Standalone micro-phases) ─────────────────────┤
Phase 9 (Shell scripts) ───────────────────────────────┤
                                                        │
Phase 10 (Cleanup & docs) ← depends on Phases 1-9 ────┘

Phases 1-9 can proceed as independent branches once Phase 0 is merged. They can be reviewed and merged in batches. Phase 10 is the final cleanup after all migrations are verified.


Risk Matrix

Phase Risk Level Why Mitigation
0 Low Additive only, no existing behavior changes N/A
1 Medium Airgap pipelines are actively used for QA Parallel coexistence, 2+ verification runs
2 Low Simple pipelines, easy to verify Quick rollback by switching Jenkins job config
3 Medium Recurring pipelines run on schedule Monitor first automated run after migration
4 High Largest files, most complex logic, release testing Extended verification period, pair review
5 Low Near-identical files, simple consolidation Quick rollback
6 Low Paired setup/delete, well-understood pattern Same pattern as Phase 1
7 Very Low Trivially simple files, no infrastructure Can be done in an afternoon
8 Medium Unique pipelines require careful preservation of logic One file at a time, verify each independently
9 Medium Shell scripts called by multiple pipeline types Test each script individually
10 Low Cleanup and documentation only N/A

Success Metrics

  1. Pipeline creation time: New pipeline variant takes < 30 minutes (currently days)
  2. Line count reduction: ~40% reduction across all Jenkinsfiles (5,719 → ~3,400)
  3. File count reduction: 37 → ~15 Jenkinsfiles (60% reduction)
  4. Shared abstractions: 0 → 12+ reusable functions (7 local, 5+ in qa-jenkins-library)
  5. Consistent syntax: All pipelines use Declarative Pipeline syntax
  6. Harmonized parameters: Same parameter names for same concepts across all pipelines
  7. Standardized scripts: All shell scripts use set -euo pipefail, shared library, quoted variables

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions