Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 55 additions & 2 deletions docs/database-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ Orchestrator supports a database provider abstraction layer that decouples core
orchestration logic from database-specific operations. This allows orchestrator
to manage different database engines through a common interface.

MySQL is the default (and currently only) provider. The abstraction layer is
designed to support future providers such as PostgreSQL.
MySQL is the default provider. PostgreSQL is also supported for streaming
replication topologies. The abstraction layer is designed to support additional
providers in the future.

## Architecture

Expand Down Expand Up @@ -82,6 +83,58 @@ preserved.
The MySQL provider is automatically registered at init time. No configuration
is needed to use it.

## PostgreSQL Provider

The PostgreSQL provider (`PostgreSQLProvider`) supports PostgreSQL streaming
replication topologies. It uses the `lib/pq` driver to connect to PostgreSQL
instances.

### Configuration

Add the following fields to your orchestrator configuration JSON:

```json
{
"PostgreSQLTopologyUser": "orchestrator",
"PostgreSQLTopologyPassword": "secret"
}
```

These credentials are used to connect to PostgreSQL topology instances for
discovery and replication management operations.

### Activating the Provider

To use PostgreSQL instead of MySQL, register the provider during startup:

```go
import "github.com/proxysql/orchestrator/go/inst"

inst.SetProvider(inst.NewPostgreSQLProvider())
```

### How It Works

| Operation | PostgreSQL Implementation |
|---------------------|---------------------------------------------------------------|
| GetReplicationStatus | Queries `pg_stat_wal_receiver` (standby) or `pg_current_wal_lsn()` (primary). Reports WAL LSN as position and `replay_lag` as lag. |
| IsReplicaRunning | Checks `pg_stat_wal_receiver` for an active WAL receiver with `status = 'streaming'`. |
| SetReadOnly | Runs `ALTER SYSTEM SET default_transaction_read_only = on/off` followed by `SELECT pg_reload_conf()`. |
| IsReadOnly | Queries `SHOW default_transaction_read_only`. |
| StartReplication | Calls `SELECT pg_wal_replay_resume()`. Streaming replication itself starts automatically when the standby connects. |
| StopReplication | Calls `SELECT pg_wal_replay_pause()` to pause WAL replay. The WAL receiver remains connected. |
Comment on lines +118 to +125
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table claims GetReplicationStatus reports replay_lag as lag, but the current provider implementation runs on the standby and will generally not be able to read replay_lag (it comes from pg_stat_replication on the primary). Please adjust the documentation to match the actual lag source, or update the provider to compute/report lag in the documented way.

Copilot uses AI. Check for mistakes.

### Differences from MySQL

- **No separate IO/SQL threads.** PostgreSQL does not have the concept of
separate IO and SQL threads. The `IOThreadRunning` and `SQLThreadRunning`
fields in `ReplicationStatus` both mirror the WAL receiver state.
- **Streaming replication is automatic.** `StartReplication` resumes WAL replay
but cannot start the WAL receiver itself -- that is controlled by PostgreSQL's
`primary_conninfo` configuration.
- **StopReplication pauses replay only.** The WAL receiver continues to receive
WAL segments; only application (replay) is paused.

## Implementing a New Provider

To add support for a new database engine:
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ require (
github.com/hashicorp/consul/api v1.33.4
github.com/hashicorp/raft v1.7.3
github.com/howeyc/gopass v0.0.0-20210920133722-c8aef6fb66ef
github.com/lib/pq v1.12.0
github.com/mattn/go-sqlite3 v1.14.37
github.com/montanaflynn/stats v0.8.2
github.com/outbrain/zookeepercli v1.0.12
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@ github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/lib/pq v1.12.0 h1:mC1zeiNamwKBecjHarAr26c/+d8V5w/u4J0I/yASbJo=
github.com/lib/pq v1.12.0/go.mod h1:/p+8NSbOcwzAEI7wiMXFlgydTwcgTr3OSKMsD2BitpA=
github.com/mattn/go-colorable v0.1.9/go.mod h1:u6P/XSegPjTcexA+o6vUJrdnUu04hMope9wVRipJSqc=
github.com/mattn/go-colorable v0.1.12/go.mod h1:u5H1YNBxpqRaxsYJYSkiCWKzEfiAb1Gb520KVy5xxl4=
github.com/mattn/go-colorable v0.1.14 h1:9A9LHSqF/7dyVVX6g0U9cwm9pG3kP9gSzcuIPHPsaIE=
Expand Down
4 changes: 4 additions & 0 deletions go/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ type Configuration struct {
AgentsServerPort string // port orchestrator agents talk back to
MySQLTopologyUser string
MySQLTopologyPassword string
PostgreSQLTopologyUser string // Username for connecting to PostgreSQL topology instances
PostgreSQLTopologyPassword string // Password for connecting to PostgreSQL topology instances
PostgreSQLSSLMode string // SSL mode for PostgreSQL connections: disable, require, verify-ca, verify-full. Default: "require"
MySQLTopologyCredentialsConfigFile string // my.cnf style configuration file from where to pick credentials. Expecting `user`, `password` under `[client]` section
MySQLTopologySSLPrivateKeyFile string // Private key file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLCertFile string // Certificate PEM file used to authenticate with a Topology mysql instance with TLS
Expand Down Expand Up @@ -334,6 +337,7 @@ func newConfiguration() *Configuration {
MySQLOrchestratorPort: 3306,
MySQLTopologyUseMutualTLS: false,
MySQLTopologyUseMixedTLS: true,
PostgreSQLSSLMode: "require",
MySQLTopologyMaxAllowedPacket: -1,
MySQLOrchestratorUseMutualTLS: false,
MySQLConnectTimeoutSeconds: 2,
Expand Down
259 changes: 259 additions & 0 deletions go/inst/provider_postgresql.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
/*
Copyright 2024 Orchestrator Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package inst

import (
"database/sql"
"fmt"
"net/url"

_ "github.com/lib/pq"
"github.com/proxysql/golib/log"
"github.com/proxysql/orchestrator/go/config"
)

// PostgreSQLProvider implements DatabaseProvider for PostgreSQL streaming
// replication topologies.
type PostgreSQLProvider struct{}

// NewPostgreSQLProvider creates a new PostgreSQL database provider.
func NewPostgreSQLProvider() *PostgreSQLProvider {
return &PostgreSQLProvider{}
}

// ProviderName returns "postgresql".
func (p *PostgreSQLProvider) ProviderName() string {
return "postgresql"
}

// openPostgreSQLTopology opens a connection to a PostgreSQL instance using
// credentials from the orchestrator configuration.
func openPostgreSQLTopology(key InstanceKey) (*sql.DB, error) {
u := &url.URL{
Scheme: "postgres",
User: url.UserPassword(config.Config.PostgreSQLTopologyUser, config.Config.PostgreSQLTopologyPassword),
Host: fmt.Sprintf("%s:%d", key.Hostname, key.Port),
Path: "postgres",
RawQuery: fmt.Sprintf("sslmode=%s&connect_timeout=5", config.Config.PostgreSQLSSLMode),
}
db, err := sql.Open("postgres", u.String())
if err != nil {
return nil, err
}
db.SetMaxOpenConns(3)
db.SetMaxIdleConns(1)
return db, nil
}

// GetReplicationStatus retrieves the replication state for a PostgreSQL instance.
// On a standby it queries pg_stat_wal_receiver; on a primary it queries
// pg_current_wal_lsn().
func (p *PostgreSQLProvider) GetReplicationStatus(key InstanceKey) (*ReplicationStatus, error) {
db, err := openPostgreSQLTopology(key)
if err != nil {
return nil, log.Errore(err)
}
defer db.Close()

Check failure on line 70 in go/inst/provider_postgresql.go

View workflow job for this annotation

GitHub Actions / lint

Error return value of `db.Close` is not checked (errcheck)

// Check whether this instance is in recovery (i.e. is a standby).
var inRecovery bool
if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil {
return nil, log.Errore(err)
}

if inRecovery {
return p.getStandbyReplicationStatus(db)
}
return p.getPrimaryReplicationStatus(db)
}

// getStandbyReplicationStatus reads replication state from a PostgreSQL standby
// via pg_stat_wal_receiver and pg_last_wal_replay_lsn().
func (p *PostgreSQLProvider) getStandbyReplicationStatus(db *sql.DB) (*ReplicationStatus, error) {
var status, lsn sql.NullString
var lagSeconds sql.NullFloat64

err := db.QueryRow(`
SELECT
COALESCE(r.status, ''),
pg_last_wal_replay_lsn()::text,
COALESCE(EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp()), -1)
FROM (SELECT 'streaming' as status FROM pg_stat_wal_receiver LIMIT 1) r
`).Scan(&status, &lsn, &lagSeconds)

if err == sql.ErrNoRows {
// No WAL receiver row means replication is not running.
return &ReplicationStatus{
ReplicaRunning: false,
IOThreadRunning: false,
SQLThreadRunning: false,
Position: "",
Lag: -1,
}, nil
}
if err != nil {
return nil, log.Errore(err)
}

ioRunning := status.Valid && status.String == "streaming"
lag := int64(-1)
if lagSeconds.Valid {
lag = int64(lagSeconds.Float64)
}

position := ""
if lsn.Valid {
position = lsn.String
}

return &ReplicationStatus{
ReplicaRunning: ioRunning,
IOThreadRunning: ioRunning,
SQLThreadRunning: ioRunning, // PG does not separate IO/SQL threads; mirror IO state
Position: position,
Lag: lag,
}, nil
}

// getPrimaryReplicationStatus returns a ReplicationStatus for a primary server.
// A primary is not itself a replica, so ReplicaRunning is false, and we report
// the current WAL insert position.
func (p *PostgreSQLProvider) getPrimaryReplicationStatus(db *sql.DB) (*ReplicationStatus, error) {
var lsn string
if err := db.QueryRow("SELECT pg_current_wal_lsn()::text").Scan(&lsn); err != nil {
return nil, log.Errore(err)
}
return &ReplicationStatus{
ReplicaRunning: false,
IOThreadRunning: false,
SQLThreadRunning: false,
Position: lsn,
Lag: 0,
}, nil
}

// IsReplicaRunning checks whether the WAL receiver is active on a PostgreSQL
// standby instance.
func (p *PostgreSQLProvider) IsReplicaRunning(key InstanceKey) (bool, error) {
db, err := openPostgreSQLTopology(key)
if err != nil {
return false, log.Errore(err)
}
defer db.Close()

Check failure on line 156 in go/inst/provider_postgresql.go

View workflow job for this annotation

GitHub Actions / lint

Error return value of `db.Close` is not checked (errcheck)

var status sql.NullString
err = db.QueryRow("SELECT status FROM pg_stat_wal_receiver LIMIT 1").Scan(&status)
if err == sql.ErrNoRows {
return false, nil
}
if err != nil {
return false, log.Errore(err)
}
return status.Valid && status.String == "streaming", nil
}

// SetReadOnly sets or clears the default_transaction_read_only parameter on
// a PostgreSQL instance and reloads the configuration.
func (p *PostgreSQLProvider) SetReadOnly(key InstanceKey, readOnly bool) error {
db, err := openPostgreSQLTopology(key)
if err != nil {
return log.Errore(err)
}
defer db.Close()

Check failure on line 176 in go/inst/provider_postgresql.go

View workflow job for this annotation

GitHub Actions / lint

Error return value of `db.Close` is not checked (errcheck)

value := "off"
if readOnly {
value = "on"
}
if _, err := db.Exec(fmt.Sprintf("ALTER SYSTEM SET default_transaction_read_only = %s", value)); err != nil {
return log.Errore(err)
}
if _, err := db.Exec("SELECT pg_reload_conf()"); err != nil {
return log.Errore(err)
}
return nil
}

// IsReadOnly checks whether default_transaction_read_only is enabled on a
// PostgreSQL instance.
func (p *PostgreSQLProvider) IsReadOnly(key InstanceKey) (bool, error) {
db, err := openPostgreSQLTopology(key)
if err != nil {
return false, log.Errore(err)
}
defer db.Close()

var value string
if err := db.QueryRow("SHOW default_transaction_read_only").Scan(&value); err != nil {
return false, log.Errore(err)
}
return value == "on", nil
}

// StartReplication is a no-op for PostgreSQL streaming replication. Streaming
// replication starts automatically when a standby connects to its primary.
// WAL replay is resumed if it was previously paused.
func (p *PostgreSQLProvider) StartReplication(key InstanceKey) error {
log.Infof("PostgreSQL streaming replication on %s:%d starts automatically; resuming WAL replay if paused", key.Hostname, key.Port)

db, err := openPostgreSQLTopology(key)
if err != nil {
return log.Errore(err)
}
defer db.Close()

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StartReplication always calls pg_wal_replay_resume(). On a primary (not in recovery) this function errors, so callers that accidentally invoke StartReplication on a primary will get a failure. Consider checking pg_is_in_recovery() first (or reusing the same inRecovery check as GetReplicationStatus) and either no-op or return a clearer error when not a standby.

Suggested change
// Only standbys (in recovery) support WAL replay control functions.
var inRecovery bool
if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil {
return log.Errore(err)
}
if !inRecovery {
log.Infof("StartReplication called on primary %s:%d; instance is not in recovery, so WAL replay resume is not applicable", key.Hostname, key.Port)
return nil
}

Copilot uses AI. Check for mistakes.
var inRecovery bool
if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil {
return log.Errore(err)
}
if !inRecovery {
log.Infof("StartReplication: %s:%d is a primary, WAL replay resume not applicable", key.Hostname, key.Port)
return nil
}

if _, err := db.Exec("SELECT pg_wal_replay_resume()"); err != nil {
return log.Errore(err)
}
return nil
}

// StopReplication pauses WAL replay on a PostgreSQL standby. This is the
// closest equivalent to stopping replication in MySQL. Note that the WAL
// receiver (IO thread equivalent) remains connected; only replay is paused.
func (p *PostgreSQLProvider) StopReplication(key InstanceKey) error {
db, err := openPostgreSQLTopology(key)
if err != nil {
return log.Errore(err)
}
defer db.Close()

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StopReplication always calls pg_wal_replay_pause(), which will error on primaries (not in recovery). Similar to StartReplication, consider guarding with pg_is_in_recovery() and no-op / return a clearer error when invoked on a primary.

Suggested change
var inRecovery bool
if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil {
return log.Errore(err)
}
if !inRecovery {
return log.Errore(fmt.Errorf("cannot pause WAL replay on primary %s:%d (not in recovery)", key.Hostname, key.Port))
}

Copilot uses AI. Check for mistakes.
var inRecovery bool
if err := db.QueryRow("SELECT pg_is_in_recovery()").Scan(&inRecovery); err != nil {
return log.Errore(err)
}
if !inRecovery {
return fmt.Errorf("StopReplication: %s:%d is a primary, WAL replay pause not applicable", key.Hostname, key.Port)
}

if _, err := db.Exec("SELECT pg_wal_replay_pause()"); err != nil {
return log.Errore(err)
}
return nil
}

// Compile-time check that PostgreSQLProvider implements DatabaseProvider.
var _ DatabaseProvider = (*PostgreSQLProvider)(nil)
Loading
Loading