Problem
The CI test infrastructure orchestration (test/infra/control/*.bash, test/tap/groups/*/pre-proxysql.bash, setup-infras.bash hooks) is built entirely in bash. This has proven unreliable and difficult to debug in practice:
Error suppression patterns
Scripts routinely suppress errors, making failures invisible:
# Actual code that was in run-tests-isolated.bash:
mysql -uradmin -pradmin -hproxysql -P6032 -e 'SELECT ...' 2>/dev/null || echo 'ERROR: Failed to query mysql_servers'
This hides the actual error (connection refused? auth failure? timeout?), prints a generic message, and continues execution as if nothing happened. Tests then fail later with confusing symptoms (e.g., "table doesn't exist" because the entire ProxySQL config was silently broken).
Fragile error handling
set -e is a bandaid — it doesn't propagate errors through pipes, subshells, or command substitutions without additional set -o pipefail
- Functions can't return structured errors — only exit codes (0-255)
- No distinction between "expected failure" and "bug" — both are just non-zero exit codes
- Traps (
trap ... EXIT) are the only cleanup mechanism, and they don't compose well
No structured logging
echo to stdout mixes with command output
- No log levels, timestamps are inconsistent
- Debugging requires adding
set -x and wading through hundreds of lines of shell expansion
Hardcoded paths and assumptions
- Scripts assume specific directory structures relative to
${BASH_SOURCE[0]}
REPO_ROOT derivation via ../../.. counting is error-prone (we had bugs with 3 vs 4 levels)
- Environment variable passing between scripts is fragile —
export in one script doesn't reach a docker exec or a subprocess reliably
Race conditions and timing
sleep 10 as synchronization primitive
- Health checks in while loops with arbitrary timeouts
- No proper readiness probes — just "can I connect?" checks
- Parallel execution (run-multi-group.bash) has no proper coordination
Real bugs found during CI migration (March 2026)
- Silent config dump failures: ProxySQL admin queries failed silently due to
2>/dev/null, tests continued with broken config, failed with misleading "table doesn't exist" errors
- Hardcoded
/home/rene/proxysql paths: 12 scripts had absolute paths that only worked on one machine
- Wrong directory traversal:
../../.. instead of ../../../.. — worked locally by accident, failed on CI runners
- WITHGCOV defaulting to 1: Caused
PROXYSQL GCOV DUMP syntax errors on non-coverage builds, hidden by error suppression
- Missing feature flag propagation: Unit test Makefile relied on environment variables that weren't set, causing link failures
- Hostgroup monitor conflicts: Setup hooks silently failed to LOAD changes to runtime, monitor overwrote configuration
Proposal
Migrate the orchestration layer to Python, keeping bash only for thin wrappers where shell is genuinely the right tool (e.g., Docker commands).
What to migrate
| Current (bash) |
Purpose |
Complexity |
ensure-infras.bash |
Start ProxySQL + backends for a TAP group |
High — Docker orchestration, health checks, hook execution |
run-tests-isolated.bash |
Launch test runner container, env setup, coverage collection |
High — Docker run, env propagation, coverage traps |
run-multi-group.bash |
Parallel group execution with timeouts |
High — Process management, log aggregation, coverage combining |
start-proxysql-isolated.bash |
Start ProxySQL in Docker |
Medium — Docker run, health check |
stop-proxysql-isolated.bash |
Cleanup ProxySQL |
Low |
destroy-infras.bash |
Cleanup backends |
Low |
docker-compose-init.bash |
Start Docker Compose infra |
Medium — Volume prep, health checks, user provisioning |
docker-*-post.bash |
Post-startup provisioning |
Medium — SQL provisioning, replication setup |
env-isolated.bash |
Environment variable setup |
Low — but critical for correctness |
*/pre-proxysql.bash |
Group-specific pre-test hooks |
Medium — cluster setup, config manipulation |
*/setup-infras.bash |
Group-specific post-infra hooks |
Medium — hostgroup remapping, fixture setup |
Benefits of Python
- Proper exceptions: errors propagate automatically, no silent failures
- Structured logging:
logging module with levels, timestamps, formatters
- Testable: orchestration logic can have unit tests
- Type hints: catch configuration errors before runtime
- subprocess management:
subprocess.run(check=True) fails loudly by default
- Docker SDK:
docker Python package instead of parsing CLI output
- Parallel execution:
concurrent.futures instead of bash background jobs
Migration approach
- Start with a Python
CIOrchestrator class that wraps the current bash scripts
- Migrate one script at a time, keeping bash as fallback
proxysql-tester.py already exists as a Python test runner — extend the pattern
- Keep bash only for: Docker Compose files, simple one-liner wrappers
What NOT to change
- Docker Compose YAML files — these are fine as-is
proxysql-tester.py — already Python, works well
- Makefile build system — separate concern
- GitHub Actions YAML — separate concern
Priority
Medium-term. The immediate bash fixes (removing 2>/dev/null, fixing paths) are done. This issue tracks the longer-term migration to make the infrastructure actually reliable.
Problem
The CI test infrastructure orchestration (
test/infra/control/*.bash,test/tap/groups/*/pre-proxysql.bash,setup-infras.bashhooks) is built entirely in bash. This has proven unreliable and difficult to debug in practice:Error suppression patterns
Scripts routinely suppress errors, making failures invisible:
This hides the actual error (connection refused? auth failure? timeout?), prints a generic message, and continues execution as if nothing happened. Tests then fail later with confusing symptoms (e.g., "table doesn't exist" because the entire ProxySQL config was silently broken).
Fragile error handling
set -eis a bandaid — it doesn't propagate errors through pipes, subshells, or command substitutions without additionalset -o pipefailtrap ... EXIT) are the only cleanup mechanism, and they don't compose wellNo structured logging
echoto stdout mixes with command outputset -xand wading through hundreds of lines of shell expansionHardcoded paths and assumptions
${BASH_SOURCE[0]}REPO_ROOTderivation via../../..counting is error-prone (we had bugs with 3 vs 4 levels)exportin one script doesn't reach adocker execor a subprocess reliablyRace conditions and timing
sleep 10as synchronization primitiveReal bugs found during CI migration (March 2026)
2>/dev/null, tests continued with broken config, failed with misleading "table doesn't exist" errors/home/rene/proxysqlpaths: 12 scripts had absolute paths that only worked on one machine../../..instead of../../../..— worked locally by accident, failed on CI runnersPROXYSQL GCOV DUMPsyntax errors on non-coverage builds, hidden by error suppressionProposal
Migrate the orchestration layer to Python, keeping bash only for thin wrappers where shell is genuinely the right tool (e.g., Docker commands).
What to migrate
ensure-infras.bashrun-tests-isolated.bashrun-multi-group.bashstart-proxysql-isolated.bashstop-proxysql-isolated.bashdestroy-infras.bashdocker-compose-init.bashdocker-*-post.bashenv-isolated.bash*/pre-proxysql.bash*/setup-infras.bashBenefits of Python
loggingmodule with levels, timestamps, formatterssubprocess.run(check=True)fails loudly by defaultdockerPython package instead of parsing CLI outputconcurrent.futuresinstead of bash background jobsMigration approach
CIOrchestratorclass that wraps the current bash scriptsproxysql-tester.pyalready exists as a Python test runner — extend the patternWhat NOT to change
proxysql-tester.py— already Python, works wellPriority
Medium-term. The immediate bash fixes (removing
2>/dev/null, fixing paths) are done. This issue tracks the longer-term migration to make the infrastructure actually reliable.