Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions exporter/.golangci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
run:
timeout: 5m
modules-download-mode: readonly

linters:
enable:
- gofmt
- goimports
- govet
- ineffassign
- misspell
- revive
- staticcheck
- unused
disable:
- errcheck
- gosec
- gosimple

linters-settings:
revive:
rules:
- name: unused-parameter
disabled: true
- name: exported
disabled: true

issues:
exclude-use-default: false
max-issues-per-linter: 0
max-same-issues: 0

exclude-rules:
- text: "should have comment"
linters:
- revive
- text: "comment on exported"
linters:
- revive
251 changes: 251 additions & 0 deletions exporter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# WAL-G Prometheus Exporter

A Prometheus exporter for WAL-G backup and WAL metrics for PostgreSQL databases.

## Features

- **Backup lag monitoring**: Track time since last backup-push (full and delta backups)
- **WAL lag monitoring**: Monitor time since last wal-push operation
- **LSN delta lag**: Calculate LSN lag in bytes between current and archived WAL
- **PITR window**: Monitor point-in-time recovery window size
- **Error monitoring**: Track WAL-G operation errors
- **WAL integrity**: Monitor WAL segment integrity status per timeline

## Metrics

The exporter provides the following metrics:

### Backup Metrics
- `walg_backup_lag_seconds{backup_type}` - Time since last backup-push in seconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all timestamps, let's specify clearly if it's timestamp of beginning of the process of end of it

- `walg_backup_count{backup_type}` - Number of backups (full/delta)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

successful attempts only or all of them?

- `walg_backup_timestamp{backup_type}` - Timestamp of last backup

### WAL Metrics
- `walg_wal_lag_seconds{timeline}` - Time since last wal-push in seconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `walg_wal_lag_seconds{timeline}` - Time since last wal-push in seconds
- `walg_wal_lag_seconds{timeline}` - Time since last successful wal-push in seconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question: "time since" is a derived metric. Isn't it better to export timestamps and let monitoring decide what to show to users/AI, raw timestamps or lag values (or both)?

- `walg_lsn_lag_bytes{timeline}` - LSN delta lag in bytes
- `walg_wal_integrity_status{timeline}` - WAL integrity status (1 = OK, 0 = ERROR)

### PITR Metrics
- `walg_pitr_window_seconds` - Point-in-time recovery window size in seconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we have gaps / multiple windows?


### Error Metrics
- `walg_errors_total{operation,error_type}` - Total number of WAL-G errors

### Exporter Metrics
- `walg_scrape_duration_seconds` - Duration of the last scrape
- `walg_scrape_errors_total` - Total number of scrape errors

## Installation

### From Source

```bash
cd exporter
go build -o walg-exporter .
```

### Using Docker

```bash
docker build -t walg-exporter .
docker run -p 9351:9351 walg-exporter
```

## Usage

### Command Line Options

```bash
./walg-exporter [flags]
```

**Flags:**
- `--web.listen-address` - Address to listen on (default: `:9351`)
- `--web.telemetry-path` - Path for metrics endpoint (default: `/metrics`)
- `--walg.path` - Path to wal-g binary (default: `wal-g`)
- `--scrape.interval` - Scrape interval (default: `60s`)
- `--log.level` - Log level (default: `info`)

### Example

```bash
# Basic usage
./walg-exporter

# Custom configuration
./walg-exporter \
--web.listen-address=":8080" \
--walg.path="/usr/local/bin/wal-g" \
--scrape.interval="30s"
```

## Configuration

The exporter requires WAL-G to be properly configured and accessible. Ensure that:

1. WAL-G is installed and in PATH (or specify with `--walg.path`)
2. WAL-G configuration is properly set up (environment variables, config file)
3. The exporter has access to execute WAL-G commands

### WAL-G Commands Used

The exporter executes the following WAL-G commands:
- `wal-g backup-list --detail --json` - Get backup information
- `wal-g wal-show --detailed-json` - Get WAL segment information

## Prometheus Configuration

Add the following to your Prometheus configuration:

```yaml
scrape_configs:
- job_name: 'walg-exporter'
static_configs:
- targets: ['localhost:9351']
scrape_interval: 60s
metrics_path: /metrics
```

## Grafana Dashboard

Example Grafana queries:

### Backup Lag
```promql
walg_backup_lag_seconds{backup_type="full"}
```

### WAL Lag
```promql
walg_wal_lag_seconds
```

### PITR Window (in hours)
```promql
walg_pitr_window_seconds / 3600
```

### Error Rate
```promql
rate(walg_errors_total[5m])
```

## Development

### Running Tests

```bash
go test -v ./...
```

### Running Benchmarks

```bash
go test -bench=. -benchmem ./...
```

### Mock Testing

The exporter includes comprehensive tests with mock WAL-G commands:

```bash
# Create a mock wal-g script
cat > mock-wal-g << 'EOF'
#!/bin/bash
case "$1" in
"backup-list")
echo '[{"backup_name":"test","time":"2024-01-01T12:00:00Z","is_full":true}]'
;;
"wal-show")
echo '{"integrity":{"status":"OK","details":[{"timeline_id":1,"status":"FOUND"}]}}'
;;
esac
EOF
chmod +x mock-wal-g

# Test with mock
./walg-exporter --walg.path=./mock-wal-g
```

## Architecture

The exporter consists of several components:

- **main.go**: HTTP server and command-line interface
- **exporter.go**: Core Prometheus collector implementation
- **wal_lag.go**: LSN parsing and WAL lag calculation
- **pitr.go**: PITR window calculation logic

### LSN Parsing

The exporter includes a full LSN parser that handles PostgreSQL LSN format:

```go
lsn, err := ParseLSN("0/1A2B3C4D")
fmt.Println(lsn.String()) // "0/1A2B3C4D"
fmt.Println(lsn.Bytes()) // 439041101
```

### Lag Calculation

WAL and LSN lag calculations:

```go
// Time-based lag
walLag := calculateWalLag(lastWalTime)

// LSN-based lag in bytes
lsnLag := calculateLSNLag(currentLSN, lastArchivedLSN)
```

## Troubleshooting

### Common Issues

1. **WAL-G command not found**
- Ensure WAL-G is in PATH or specify with `--walg.path`
- Check that WAL-G is executable

2. **Permission denied**
- Ensure the exporter has permission to execute WAL-G
- Check WAL-G configuration file permissions

3. **No metrics**
- Verify WAL-G commands work manually
- Check exporter logs for errors
- Ensure WAL-G is properly configured

### Debug Mode

Enable debug logging:

```bash
./walg-exporter --log.level=debug
```

### Health Check

The exporter provides a health endpoint:

```bash
curl http://localhost:9351/
```

## Contributing

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## License

This project is licensed under the same license as WAL-G.

## Support

For issues and questions:
- Check the [WAL-G documentation](https://github.com/wal-g/wal-g)
- File an issue in the WAL-G repository
- Join the WAL-G community discussions
Loading