Step-by-step guides for common orchestrator workflows.
This tutorial walks you through setting up orchestrator to manage an existing MySQL master-replica topology.
- A running MySQL master with one or more replicas (MySQL 5.7+ or 8.0+)
- Go 1.25+ installed
- Network access from the orchestrator host to all MySQL instances on port 3306
git clone https://github.com/proxysql/orchestrator.git
cd orchestrator
go build -o bin/orchestrator ./go/cmd/orchestratorOn your MySQL master (this will replicate to all replicas automatically):
CREATE USER 'orc_topology'@'orchestrator-host' IDENTIFIED BY 'a_secure_password';
GRANT SUPER, PROCESS, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'orc_topology'@'orchestrator-host';Replace orchestrator-host with the hostname or IP of the machine running orchestrator. Use % for any host.
For production use, orchestrator should store its data in MySQL rather than SQLite. On a MySQL instance (can be the same master, or a separate server):
CREATE DATABASE orchestrator;
CREATE USER 'orc_server'@'localhost' IDENTIFIED BY 'another_secure_password';
GRANT ALL ON orchestrator.* TO 'orc_server'@'localhost';Create orchestrator.conf.json:
{
"Debug": false,
"ListenAddress": ":3000",
"MySQLTopologyUser": "orc_topology",
"MySQLTopologyPassword": "a_secure_password",
"MySQLOrchestratorHost": "127.0.0.1",
"MySQLOrchestratorPort": 3306,
"MySQLOrchestratorDatabase": "orchestrator",
"MySQLOrchestratorUser": "orc_server",
"MySQLOrchestratorPassword": "another_secure_password",
"DefaultInstancePort": 3306,
"DiscoverByShowSlaveHosts": true,
"InstancePollSeconds": 5,
"ReasonableReplicationLagSeconds": 10,
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"ApplyMySQLPromotionAfterMasterFailover": true,
"FailureDetectionPeriodBlockMinutes": 60,
"RecoveryPeriodBlockSeconds": 3600
}bin/orchestrator -config orchestrator.conf.json httpcurl http://localhost:3000/api/discover/your-master-host/3306Wait a few seconds for orchestrator to crawl the replicas, then verify:
curl -s http://localhost:3000/api/topology/your-master-host/3306You should see your full replication tree printed as indented text.
Open http://localhost:3000 in your browser. Click on Clusters in the navigation to see your topology visualized as a tree.
Move a replica to a different position (dry run with the API):
# List replicas of the master
curl -s http://localhost:3000/api/instance-replicas/your-master-host/3306You now have a fully operational orchestrator instance managing your MySQL topology.
This tutorial sets up orchestrator to automatically update ProxySQL hostgroups during master failover, so your application traffic is rerouted without any custom scripts.
- A working orchestrator setup (see Tutorial 1)
- ProxySQL installed and running with the Admin interface accessible
- Your MySQL servers already configured as backends in ProxySQL
mysql -h 127.0.0.1 -P 6032 -u admin -padmin -e "SELECT * FROM runtime_mysql_servers;"You should see your MySQL servers listed with their hostgroups.
Identify which hostgroup ID is used for writers and which for readers:
mysql -h 127.0.0.1 -P 6032 -u admin -padmin \
-e "SELECT hostgroup_id, hostname, port, status FROM runtime_mysql_servers;"For example, if writers are in hostgroup 10 and readers in hostgroup 20, you will use those values below.
Add these fields to your orchestrator.conf.json:
{
"ProxySQLAdminAddress": "127.0.0.1",
"ProxySQLAdminPort": 6032,
"ProxySQLAdminUser": "admin",
"ProxySQLAdminPassword": "admin",
"ProxySQLWriterHostgroup": 10,
"ProxySQLReaderHostgroup": 20,
"ProxySQLPreFailoverAction": "offline_soft"
}| Field | Description |
|---|---|
ProxySQLWriterHostgroup |
The hostgroup ID where the current master lives. Must be > 0 to enable hooks. |
ProxySQLReaderHostgroup |
The hostgroup ID for read replicas. Optional but recommended. |
ProxySQLPreFailoverAction |
What to do with the old master before failover: offline_soft (drain connections), weight_zero, or none. |
# Stop the running instance (Ctrl+C), then:
bin/orchestrator -config orchestrator.conf.json httpcurl -s http://localhost:3000/api/proxysql/servers | python3 -m json.toolYou should see your ProxySQL server list returned as JSON.
When orchestrator detects a dead master and performs recovery:
- Pre-failover: The old master is set to
OFFLINE_SOFTin ProxySQL (no new connections) - Topology recovery: Orchestrator promotes a replica to be the new master
- Post-failover: The new master is added to the writer hostgroup; the old master is removed
- ProxySQL applies changes immediately via
LOAD MYSQL SERVERS TO RUNTIME
ProxySQL hooks are non-blocking: if ProxySQL is unreachable, the MySQL failover still proceeds.
To verify everything works without an actual failure, perform a graceful master takeover:
# Identify the current master
curl -s http://localhost:3000/api/clusters
# Perform a graceful takeover (promotes a replica, demotes the master)
curl -s http://localhost:3000/api/graceful-master-takeover/your-cluster-alias/your-new-master-host/3306Check ProxySQL to confirm the hostgroups updated:
mysql -h 127.0.0.1 -P 6032 -u admin -padmin \
-e "SELECT hostgroup_id, hostname, port, status FROM runtime_mysql_servers;"For more details, see the full ProxySQL hooks documentation.
This tutorial sets up Prometheus to scrape orchestrator metrics and shows useful queries for alerting.
- A running orchestrator instance
- Prometheus installed (see prometheus.io/docs)
Prometheus metrics are enabled by default. Verify by adding this to your orchestrator.conf.json (or confirm it is not explicitly disabled):
{
"PrometheusEnabled": true
}Restart orchestrator if you changed the config.
curl -s http://localhost:3000/metrics | head -20You should see Prometheus-formatted metrics output.
Add a scrape job to your prometheus.yml:
scrape_configs:
- job_name: orchestrator
static_configs:
- targets: ['orchestrator-host:3000']
metrics_path: /metrics
scrape_interval: 15sReplace orchestrator-host with the actual hostname or IP. Reload Prometheus:
kill -HUP $(pgrep prometheus)
# or restart the Prometheus serviceOpen the Prometheus UI (typically http://prometheus-host:9090) and query:
orchestrator_instances_total
You should see the number of MySQL instances orchestrator is managing.
Total known instances and clusters:
orchestrator_instances_total
orchestrator_clusters_total
Discovery error rate (over last 5 minutes):
rate(orchestrator_discovery_errors_total[5m])
Recovery operations by type:
sum by (type) (orchestrator_recoveries_total)
Recovery duration (p95 over last hour):
histogram_quantile(0.95, rate(orchestrator_recovery_duration_seconds_bucket[1h]))
Create an alerting rule file (e.g., orchestrator-alerts.yml):
groups:
- name: orchestrator
rules:
- alert: OrchestratorHighDiscoveryErrors
expr: rate(orchestrator_discovery_errors_total[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "Orchestrator has a high discovery error rate"
description: "More than 0.1 discovery errors/second for the last 10 minutes."
- alert: OrchestratorRecoveryOccurred
expr: increase(orchestrator_recoveries_total[5m]) > 0
labels:
severity: critical
annotations:
summary: "Orchestrator performed a recovery"
description: "A failover or recovery event occurred in the last 5 minutes."
- alert: OrchestratorDown
expr: up{job="orchestrator"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Orchestrator is unreachable"Reference this file in your prometheus.yml:
rule_files:
- orchestrator-alerts.ymlIf running orchestrator in Kubernetes, use the built-in health check endpoints for liveness and readiness probes:
livenessProbe:
httpGet:
path: /api/status
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/status
port: 3000
initialDelaySeconds: 5
periodSeconds: 5For the full list of metrics, see the Observability documentation.
This tutorial introduces the v2 REST API, which provides structured JSON responses and proper HTTP status codes.
- A running orchestrator instance with at least one discovered topology
All v2 endpoints return a consistent JSON envelope:
{
"status": "ok",
"data": { ... }
}On errors:
{
"status": "error",
"error": {
"code": "ERROR_CODE",
"message": "Human-readable description"
}
}HTTP status codes (200, 400, 404, 500, 503) are used correctly, unlike the v1 API which always returns 200.
curl -s http://localhost:3000/api/v2/clusters | python3 -m json.toolExample response:
{
"status": "ok",
"data": [
{
"clusterName": "master.example.com:3306",
"clusterAlias": "production",
"instanceCount": 5
}
]
}curl -s http://localhost:3000/api/v2/clusters/master.example.com:3306 | python3 -m json.toolcurl -s http://localhost:3000/api/v2/clusters/master.example.com:3306/instances | python3 -m json.toolcurl -s http://localhost:3000/api/v2/instances/replica1.example.com/3306 | python3 -m json.toolcurl -s http://localhost:3000/api/v2/clusters/master.example.com:3306/topology | python3 -m json.toolcurl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/v2/statusA 200 response means the node is healthy. A 500 response means it is not.
# All recent recoveries
curl -s http://localhost:3000/api/v2/recoveries | python3 -m json.tool
# Filter by cluster
curl -s "http://localhost:3000/api/v2/recoveries?cluster=master.example.com:3306" | python3 -m json.tool
# Active recoveries only
curl -s http://localhost:3000/api/v2/recoveries/active | python3 -m json.toolIf ProxySQL hooks are configured:
# All servers
curl -s http://localhost:3000/api/v2/proxysql/servers | python3 -m json.toolIf ProxySQL is not configured, you will receive a 503 status:
{
"status": "error",
"error": {
"code": "PROXYSQL_NOT_CONFIGURED",
"message": "ProxySQL is not configured"
}
}The structured responses make scripting straightforward. Example: get all instance hostnames in a cluster using jq:
curl -s http://localhost:3000/api/v2/clusters/master.example.com:3306/instances \
| jq -r '.data[].Key.Hostname'Check if any recoveries happened in the last hour:
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/v2/recoveries/active)
if [ "$STATUS" = "200" ]; then
ACTIVE=$(curl -s http://localhost:3000/api/v2/recoveries/active | jq '.data | length')
echo "Active recoveries: $ACTIVE"
fiFor the full endpoint reference, see the API v2 documentation. An OpenAPI 3.0 specification is also available for client generation.
This tutorial walks you through configuring orchestrator to manage a PostgreSQL streaming replication topology. Orchestrator discovers PostgreSQL primaries and standbys, monitors replication health, and can perform automated failover when a primary fails.
- PostgreSQL 12+ primary with one or more streaming replication standbys already configured
- Go 1.25+ installed (for building from source)
- Network access from the orchestrator host to all PostgreSQL instances on port 5432
git clone https://github.com/proxysql/orchestrator.git
cd orchestrator
go build -o bin/orchestrator ./go/cmd/orchestratorOn your PostgreSQL primary (this user must exist on all instances -- primary and standbys):
CREATE USER orchestrator WITH PASSWORD 'orch_pass';
GRANT pg_monitor TO orchestrator;The pg_monitor role grants read access to pg_stat_replication, pg_stat_wal_receiver, and other monitoring views that orchestrator needs for discovery.
Note: If you are using PostgreSQL 9.6 (not recommended), you need to grant
SELECTon the individual monitoring views instead of usingpg_monitor.
On each PostgreSQL instance, ensure pg_hba.conf allows connections from the orchestrator host:
# TYPE DATABASE USER ADDRESS METHOD
host all orchestrator orchestrator-host/32 md5
Reload PostgreSQL after editing:
psql -c "SELECT pg_reload_conf();"Create orchestrator.conf.json:
{
"Debug": true,
"ListenAddress": ":3000",
"ProviderType": "postgresql",
"PostgreSQLTopologyUser": "orchestrator",
"PostgreSQLTopologyPassword": "orch_pass",
"PostgreSQLSSLMode": "require",
"BackendDB": "sqlite",
"SQLite3DataFile": "/tmp/orchestrator.sqlite3",
"DefaultInstancePort": 5432,
"InstancePollSeconds": 5,
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"FailureDetectionPeriodBlockMinutes": 60,
"RecoveryPeriodBlockSeconds": 3600
}Key fields explained:
| Field | Purpose |
|---|---|
ProviderType |
Set to "postgresql" to enable PostgreSQL mode. Default is "mysql". |
PostgreSQLTopologyUser / Password |
Credentials orchestrator uses to connect to your PostgreSQL instances. |
PostgreSQLSSLMode |
SSL mode for PostgreSQL connections: disable, require, verify-ca, or verify-full. |
DefaultInstancePort |
Set to 5432 for PostgreSQL (default is 3306 for MySQL). |
bin/orchestrator -config orchestrator.conf.json httpYou should see output indicating the service has started and is listening on port 3000.
Tell orchestrator about your PostgreSQL primary. Replace pg-primary with the actual hostname or IP:
curl http://localhost:3000/api/discover/pg-primary/5432Expected output:
{
"Key": {"Hostname": "pg-primary", "Port": 5432},
"Uptime": 1,
"FlavorName": "PostgreSQL",
"Version": "16.2",
"ReadOnly": false
}Orchestrator connects to the primary, queries pg_stat_replication to discover connected standbys, and recursively probes each standby.
Open your browser to http://localhost:3000. You should see your PostgreSQL replication topology visualized as a tree:
- The primary node at the top (read-only: false)
- Standby nodes underneath (read-only: true)
- Replication lag displayed for each standby
List discovered clusters:
curl -s http://localhost:3000/api/clustersView the topology:
curl -s http://localhost:3000/api/topology/pg-primary/5432Check replication analysis (should show NoProblem if everything is healthy):
curl -s http://localhost:3000/api/replication-analysisExample healthy output:
[]An empty array means no problems detected.
To verify failover works, you can simulate a primary failure by stopping PostgreSQL on the primary:
# On the primary host:
pg_ctl stop -D /var/lib/postgresql/16/main -m fastWithin a few seconds, orchestrator will detect the DeadPrimary condition and, if automated recovery is enabled, will:
- Select the best standby for promotion (lowest lag, most up-to-date WAL position)
- Call
pg_promote()on the selected standby - Reconfigure remaining standbys to replicate from the new primary via
ALTER SYSTEM SET primary_conninfoandpg_reload_conf()
Monitor the recovery in the web UI or via the API:
curl -s http://localhost:3000/api/replication-analysisAfter recovery completes:
curl -s http://localhost:3000/api/topology/new-primary-host/5432You should see the new primary at the top with the remaining standbys underneath.
Check that all standbys are replicating from the new primary:
# On the new primary:
psql -c "SELECT client_hostname, state, sent_lsn, replay_lsn FROM pg_stat_replication;"Check orchestrator's view:
curl -s http://localhost:3000/api/v2/clusters | python3 -m json.toolWhen running in PostgreSQL mode, be aware of these differences:
- No intermediate masters. PostgreSQL streaming replication does not support cascading replication in the same way as MySQL. Orchestrator treats all standbys as direct replicas of the primary.
- No GTID/Pseudo-GTID. PostgreSQL uses WAL (Write-Ahead Log) positions (LSN) instead of GTIDs. Orchestrator maps LSN to its internal binlog coordinate system.
- Promotion uses
pg_promote(). Instead ofSTOP SLAVE/RESET SLAVE/CHANGE MASTER, orchestrator callspg_promote()on the standby to make it a primary. - Standby reconfiguration uses
ALTER SYSTEM. To repoint a standby to a new primary, orchestrator updatesprimary_conninfoviaALTER SYSTEMand reloads the configuration. - ProxySQL integration is not supported in PostgreSQL mode. Use PgBouncer or another PostgreSQL-aware connection pooler.
- Database providers documentation -- architecture and provider details
- Configuration reference -- all PostgreSQL-related configuration fields
- User manual -- PostgreSQL sections in chapters 2-5