diff --git a/advocacy_docs/supported-open-source/warehousepg/index.mdx b/advocacy_docs/supported-open-source/warehousepg/index.mdx index 2057ed6850..156f8a2df4 100644 --- a/advocacy_docs/supported-open-source/warehousepg/index.mdx +++ b/advocacy_docs/supported-open-source/warehousepg/index.mdx @@ -7,6 +7,7 @@ navigation: - observability - flowserver - whpg-copy +- wem directoryDefaults: iconName: BigData --- diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/get-started.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/get-started.mdx new file mode 100644 index 0000000000..7dc4327070 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/get-started.mdx @@ -0,0 +1,115 @@ +--- +title: Getting started with WarehousePG Enterprise Manager +navTitle: Getting started +description: Learn how to access the console, understand user roles, and navigate the primary dashboard components. +--- + +Once WarehousePG Enterprise Manager (WEM) is installed and the backend services are running, you can begin observing system health and tuning performance. + +!!! Important + The host running WEM must remain online and the application must be active to ensure continuous data gathering. Closing the application will stop the collection of SQL and cluster-level metrics. + +## Accessing the console + +To begin managing your cluster: + +1. Navigate to your WEM server URL (e.g., `http://your-server:8080`). +1. Enter your administrative credentials on the login screen. +1. Select **Sign in**. + +For security, sessions automatically expire after a period of inactivity; if a timeout occurs, the system will display a "Session Expired" message and redirect you to the login screen. You can also manually terminate your session at any time by selecting the **Logout** icon located in the sidebar footer. + + +## Navigating the interface structure + +The WEM interface is organized into the following functional areas: +1. **Panels (left sidebar):** The primary functional modules (e.g., **Cluster**, **Monitoring**, **Access Management**). +1. **Header (top):** View the current page title, system time, and global controls (filters and refresh triggers). +1. **Tabs (top of content):** Specific tools or sub-views within a selected panel. +1. **Main content (center):** Interact with data tables, performance charts, and configuration tools. + +**Navigation directory** + +Use the following guide to locate specific tools and their corresponding documentation: + +| Panel (sidebar) | Key actions | +| --------------- | ------------ | +| Cluster | [Verifying the cluster health](monitoring/cluster-overview/) | +| Query Monitor | [Monitoring and evaluating queries](performance/query-monitor/) | +| Data Analysis | [Analyzing data distribution](performance/data-analysis/) | +| Storage | [Planning storage capacity](performance/storage/) | +| Access Management | [Defining access policies](system-access/access-management/) | +| System Metrics | [Visualizing hardware performance](monitoring/system-metrics/) | +| Resource Management | [Managing system resources](performance/managing-resources/) | +| Backups | [Securing backups](performance/backups/) | +| Logs | [Auditing system logs](monitoring/logs/) | +| Monitoring | [Validating database responsiveness](monitoring/monitoring/) | +| Alerts | [Managing alerts](monitoring/alerts/) | +| Management | [Provisioning user accounts](system-access/management/) + + +## Understanding user roles and permissions + +WEM utilizes Role-Based Access Control (RBAC). After the initial bootstrap, all user management and password updates are handled exclusively via the **Management** panel in the UI. A user's assigned role determines which panels and actions are visible. + +| Role | Description | Access scope | +| :--- | :--- | :--- | +| **Admin** | Full system administration | All panels, user management, and system configuration. | +| **Operator** | Operational management | Query monitoring, data analysis, and backup operations. | +| **Viewer** | Read-only observation | High-level dashboards, cluster status, and system metrics. | + +Refer to the [Role permissions matrix](reference#role-permissions-matrix) for details. + +## Monitoring cluster health via the Dashboard + +The **Dashboard** is your landing page, providing a real-time snapshot of cluster health. + +**Use the global controls** +- **Node Filter:** Scope metrics to a specific node or view an aggregate of the entire cluster. +- **Refresh:** Manually update all data points on the page. + +**Check primary metrics** + +Monitor high-level indicators for an immediate status check, such as **Uptime**, **Connections**, and **Last Sync** time. + +**Review status and resources:** +- View healthy vs. unhealthy segments. +- Monitor storage utilization for the coordinator and segments. + +**Analyze performance charts** + +WEM streams live data into three primary charts: +- **Active Queries:** Track running, queued, and blocked queries. +- **CPU Usage:** Visualize system and user utilization. +- **Memory Usage:** Monitor total memory usage percentage. + +**Audit recent alerts** + +Review the **Recent WHPG Log Alerts** card for WHPG log events. Open the **Logs** panel for a full audit trail. + +## Configuring WEM settings + +Once you have installed WEM, you can fine-tune how WEM connects to your cluster or configure other external services (Prometheus and Alertmanager) using two methods: + +1. **Method 1: Use the WEM settings tab** + + Administrators can modify most operational parameters directly through the browser: + 1. Navigate to **Management** > **Settings**. + 1. Update fields such as **Prometheus URL** or **Backup History Database Path**. + 1. Save to apply changes immediately. + +!!! Note + Some system parameters are only accessible via the configuration file. + +2. **Method 2: Edit the configuration file manually** + + For system parameters not exposed in the WEM console, or for automated deployments, edit the WEM configuration file directly on the host server: + 1. Stop the service: `systemctl stop wem` on the WEM host. + 1. Edit the file `/etc/wem/wem.conf` and modify the desired parameter. + 1. Restart the service: `systemctl start wem`. + + + + + + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/images/architecture.svg b/advocacy_docs/supported-open-source/warehousepg/wem/images/architecture.svg new file mode 100644 index 0000000000..4bce8e40fb --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/images/architecture.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/index.mdx new file mode 100644 index 0000000000..4409195e68 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/index.mdx @@ -0,0 +1,57 @@ +--- +title: WarehousePG Enterprise Manager +description: Use WarehousePG Enterprise Manager as a centralized hub for monitoring, managing, and optimizing WarehousePG clusters. +navigation: +- release_notes +- overview +- installing +- get-started +- monitoring +- performance +- system-access +- troubleshooting +- reference +navRootedTo: /supported-open-source/warehousepg/ +--- + +WarehousePG Enterprise Manager (WEM) is a comprehensive management, monitoring, and administration platform designed specifically for WarehousePG (WHPG) clusters. By integrating real-time telemetry, AI-assisted development, and interactive configuration tools into a single interface, WEM transforms complex distributed database operations into a streamlined, visual experience. Whether you are auditing cluster health, tuning SQL performance, or managing security, WEM provides the single pane of glass necessary to maintain a high-performance distributed environment. + +## Why WEM? + +Managing a distributed database manually across multiple segment nodes and coordinators can be resource-intensive and error-prone. WEM solves these challenges by providing: +- **Unified visibility:** Move beyond per-node SSH sessions. WEM aggregates host metrics, SQL statistics, and system logs into one centralized location. + +- **Reduced operational fisk:** With features like the interactive HBA editor and automatic configuration backups, WEM provides a safety net for critical administrative changes. + +- **AI-driven optimization:** Leverage built-in AI intelligence to explain execution plans and suggest query optimizations, lowering the barrier to entry for managing complex distributed workloads. + +- **Proactive resilience:** Integrated Canary checks and automated alerting ensure you are notified of potential bottlenecks or connectivity issues before they impact your users. + + +### Key capabilities + +WEM introduces a robust suite of tools for the modern database administrator and developer: + +- **Observability & health** + - **Real-time & historical metrics:** Monitor live cluster pulse via the integrated Exporter or analyze long-term trends using Prometheus data. + + - **Cluster overview:** A dedicated dashboard for the immediate health status of your coordinator and segment architecture. + + - **Log management:** Searchable, aggregated log streams powered by Loki for rapid root-cause analysis. + +- **Intelligence & analysis** + - **Query monitor & editor:** A full-featured SQL editor featuring an AI Assistant to help write, optimize, and explain distributed queries. + + - **Data skew analysis:** Deep-dive tools to analyze table distribution and storage efficiency across the cluster segments. + +- **Cluster administration** + - **Access management:** An interactive HBA editor for `pg_hba.conf` and a centralized user interface for user/password management. + + - **Resource management:** Real-time tracking of CPU, Memory, and I/O consumption across the entire cluster. + + - **System settings audit:** A searchable interface to verify and audit cluster-level GUC parameters. + +- **Data protection** + - **Backup & recovery:** Centralized monitoring of backup schedules, recovery points, and success rates. + + - **Canary checks:** Automated "heartbeat" tests that proactively verify cluster connectivity and query responsiveness. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/installing/collector.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/installing/collector.mdx new file mode 100644 index 0000000000..62d97e5048 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/installing/collector.mdx @@ -0,0 +1,97 @@ +--- +title: Installing the Collector +navTitle: Installing the Collector +description: Learn how to install the WarehousePG Collector on your WarehousePG cluster. +--- + +Install the WarehousePG (WHPG) Collector on your WHPG cluster coordinator. + +## Downloading and installing WHPG Collector + +1. On the coordinator, download the packages from the EDB repository: + + ```bash + export EDB_SUBSCRIPTION_TOKEN= + export EDB_REPO=gpsupp + curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash + sudo dnf download edb-whpg-observability-collector + ``` + +1. On the coordinator, create a file `all_hosts` which lists all hosts in the WHPG cluster. For example: + + ```ini + cdw + scdw + sdw1 + sdw2 + sdw3 + ``` + +1. From the coordinator, transfer and install the Collector package on all hosts in the WHPG cluster: + + + + + ```bash + gpssh -f all_hosts -u gpadmin -e "scp gpadmin@$(hostname):edb-whpg-observability-collector*.rpm /tmp/ && sudo dnf install -y /tmp/edb-whpg-observability-collector*.rpm" + ``` + + + + + ```bash + gpssh -f all_hosts -u gpadmin -e "scp gpadmin@$(hostname):edb-whpg-observability-collector*.rpm /tmp/ && sudo yum install -y /tmp/edb-whpg-observability-collector*.rpm" + + ``` + + + + +## Configuring the Collector + +Once the Collector packages are installed, edit the file `/var/lib/whpg-observability-collector/collector.conf` on the coordinator and configure the following parameters: + +- `WHPG_OBS_DSN`: Specify your WHPG cluster connection details. For example: + + ```ini + WHPG_OBS_DSN="host=whpg-coordinator-host port=5432 dbname=postgres user=gpadmin password=postgres sslmode=disable" + ``` + + !!! Note + You can specify any database on your WHPG cluster. However, the user must hold the superuser role. + +- `LOKI_ENDPOINT`: Point to your configured Loki endpoint for log files. For example: + + ```ini + LOKI_ENDPOINT="http://loki.hostname:3100/loki/api/v1/push" + ``` + +- `PROMETHEUS_ENDPOINT`: Point to your configured Prometheus endpoint for host-level metrics. For example: + + ```ini + PROMETHEUS_ENDPOINT="http://prometheus.hostname:9090/api/v1/write" + ``` + + +## Starting the Collector service + +On the coordinator, run the following commands to deploy the configuration and start the service on every host in the WHPG cluster: + + ```bash + cd /var/lib/whpg-observability-collector + ./deploy-observability + ``` + +The Collector now runs in the background on each host as the `alloy` service. You can manage this service using `systemctl` commands. For example: + +To check the service status: + +```bash +sudo systemctl status alloy +``` + +To enable on boot: + +```bash +sudo systemctl enable alloy +``` diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/installing/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/installing/index.mdx new file mode 100644 index 0000000000..6518c5a95d --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/installing/index.mdx @@ -0,0 +1,20 @@ +--- +title: Installing WarehousePG Enterprise Manager +navTitle: Installing +description: Learn how to install WarehousePG Enterprise Manager. +navigation: +- prerequisites +- collector +- wem +--- + + +!!! Note +This guide assumes you have installed Prometheus and Loki. You can deploy dedicated instances for WEM or integrate with your existing enterprise monitoring stack. +!!! + +The installation process consists of the following steps: + +1. Verify the [Prerequisites](prerequisites). +1. [Install the WHPG Collector](collector) on your WarehousePG (WHPG) cluster. +1. [Install and configure WEM](wem) on your dedicated host. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/installing/prerequisites.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/installing/prerequisites.mdx new file mode 100644 index 0000000000..0ca341b871 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/installing/prerequisites.mdx @@ -0,0 +1,36 @@ +--- +title: WHPG Observability prerequisites +navTitle: Prerequisites +description: Understand the prerequisites before installing the WarehousePG Observability components. +--- + +!!! Note +This guide assumes you have installed Prometheus and Loki. You can deploy dedicated instances for WEM or integrate with your existing enterprise monitoring stack. +!!! + +## Prerequisites + +- WarehousePG (WHPG) version 6.x running on RHEL 7 or RHEL 8. +- WHPG version 7.x running on RHEL 8 or RHEL 9. +- A separate host for WarehousePG Enterprise Manager (WEM), running RHEL 8 or RHEL 9. +- [Loki](https://grafana.com/docs/loki/latest/setup/install/) 3.5 or later. You can deploy a dedicate instance for WEM or integrate with your existing enterprise monitoring stack. +- [Prometheus](https://prometheus.io/docs/prometheus/latest/installation/) 3.5.0 or later. You can deploy a dedicate instance for WEM or integrate with your existing enterprise monitoring stack. +- A database user that holds the superuser role and is able to connect to your WHPG cluster from the WEM host. +- Optional: An active Anthropic account and a valid API key are required to enable the AI Assistant for query writing and optimization. +- Optional: An active installation of [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) version 0.28.1 or later is required to enable centralized alert handling and notifications within WEM. + +!!! Warning + WEM is not supported for installation on Red Hat Enterprise Linux 7 (RHEL 7). This limitation applies only to where the WEM application itself is deployed; WEM is fully capable of monitoring WHPG clusters that are running on RHEL 7 nodes. + +## Network requirements + +The following table lists the connection requirements among the different components. Note that the ports listed are default values, you can customize them according to your environment: + +| Source | Destination | Use +| ------ | ----------- | --- +| WHPG Coordinator | Prometheus:9090 | Push host level metrics +| WHPG Coordinator | Loki:3100 | Push log files +| WEM | Prometheus:9090 | Retrieve host level metrics +| WEM | Loki:3100 | Retrieve log metrics +| End user | WEM:8080 | Access to WEM console + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/installing/wem.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/installing/wem.mdx new file mode 100644 index 0000000000..084a85a6eb --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/installing/wem.mdx @@ -0,0 +1,143 @@ +--- +title: Installing WEM +navTitle: Installing WEM +description: Learn how to install the WarehousePG Enterprise Manager service. +deepToC: true +--- + +Install the WarehousePG Enterprise Manager (WEM) package on your designated host. + +## Downloading and installing WEM + +1. Download the package from the EDB repository: + + ```bash + export EDB_SUBSCRIPTION_TOKEN= + export EDB_REPO=gpsupp + curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash + sudo dnf download whpg-enterprise-manager + ``` + +1. Install WEM on your designated host: + + ```bash + sudo dnf install -y whpg-enterprise-manager + ``` + +## Configuring WEM + +Edit the configuration file `/etc/wem/wem.conf` and configure the following parameters: + + +1. Set the values of `WHPG_HOST`, `WHPG_PORT`, `WHPG_DATABASE`, `WHPG_USER` , and `WHPG_PASSWORD` to point to your WHPG cluster. For example: + + ```ini + WHPG_HOST=whpg-coordinator-host + WHPG_PORT=5432 + WHPG_DATABASE=postgres + WHPG_USER=gpadmin + WHPG_PASSWORD=postgres + ``` + + !!! Note + You can specify any database on your WHPG cluster. However, the user must hold the superuser role. + +1. Set the values of the Prometheus and Loki endpoints. For example: + + ```ini + PROMETHEUS_URL="http://prometheus.hostname:9090" + LOKI_URL="http://loki.hostname:3100" + ``` + +1. To enable the AI Assistant for query writing and optimization, configure the `ANTHROPIC_API_KEY` parameter. Note that this requires an active Anthropic account. + +1. To enable centralized alert handling and notifications with Alertmanager, configure the `ALERTMANAGER_URL` parameter to point to your Alertmanager endpoint. + +1. Optionally, configure advanced settings to fine-tune logging and data transmission: + + - `WEM_EXPORTER_LOG_LEVEL`: Sets the verbosity of the Exporter logs (`debug`, `info`, `warn`, or `error`) to assist with troubleshooting. + + - `WEM_EXPORTER_REMOTE_WRITE_INTERVAL`: Defines how frequently (e.g., 15s, 1m, 5m) the Exporter pushes collected metrics to the remote storage. + + - `WEM_EXPORTER_REMOTE_WRITE_TIMEOUT`: Specifies the maximum time (e.g., 30s, 1m, 2m) allowed for a data push to complete before the attempt is considered a failure. + +### Configuring WEM portal access + +Define how you access the WEM portal by configuring the following parameters in the WEM configuration file: + +```ini +WEM_HOST=wem-hostname +WEM_USER=admin +WEM_PORT=8080 +``` + +To establish the initial administrative credentials, choose one of the three following methods: + +!!! Note + The methods outlined below are strictly for the **initial setup** of the administrator password. Once the WEM portal is initialized and you have logged in for the first time, all subsequent user management—including adding new users, modifying roles, and updating passwords—must be handled directly through the WEM user interface. + +1. **Option 1: Interactive setup** This is the simplest method for manual installations. You do not need to modify the configuration file manually; the system will prompt you for the password and handle the internal setup. + + Run the `wem setup` command and follow the prompt to enter your desired password: + + ```bash + wem setup --interactive + ``` + +1. **Option 2: Static configuration** Use this method to define your administrative password directly within the WEM configuration file. + + ```ini + WEM_ADMIN_PASSWORD=your-secure-password + ``` + +1. **Option 3: File-based configuration** For enhanced security, manage the password via an external file. This is ideal for automated deployments or keeping secrets out of configuration files. + + 1. Create a file containing your password: + + ```bash + echo "your-secure-password" > /etc/wem/admin.pw + ``` + + Set the following parameter in the WEM configuration file: + + ```ini + WEM_ADMIN_PASSWORD_FILE=/etc/wem/admin.pw + ``` + +## Starting the WEM service + +Run the following commands to enable and start the WEM service on your dedicated host: + + ```bash + systemctl enable wem + systemctl start wem + ``` + +Verify that the service is running and active: + +```bash +sudo systemctl status wem +``` + +## Verifying the installation + +After starting the service, use these diagnostic tools to ensure all WEM components are communicating correctly and the environment is healthy: + +- Run `wem setup` to test WEM configuration settings: + + ```bash + wem setup --verify + ``` + See [wem setup command reference](../reference/commands#wem-setup) for details. + +- Perform a comprehensive "doctor" check to identify potential configuration errors, missing dependencies, or connectivity issues: + + ```bash + wem doctor + ``` + + See [wem doctor command reference](../reference/commands#wem-doctor) for details. + + +!!! Note Post-installation changes: + If you need to modify your environment after the initial setup, refer to the [Configuring WEM settings](../get-started#configuring-wem-settings) section for the appropriate procedures and requirements. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/alerts.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/alerts.mdx new file mode 100644 index 0000000000..3f5a93c7f2 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/alerts.mdx @@ -0,0 +1,36 @@ +--- +title: Managing alerts +navTitle: Managing alerts +description: Use the Alerts panel to integrate with Prometheus Alertmanager and govern the incident lifecycle through real-time notifications. +deepToC: true +--- + +The **Alerts** panel on the left sidebar serves as the central nervous system for your cluster, aggregating health signals from across your infrastructure. This panel integrates directly with Prometheus Alertmanager to provide a unified interface for incident response and rule management. + +!!! Warning "Alertmanager required" + If the **Alerts** panel displays `Alertmanager Not Configured`, you must set the `ALERTMANAGER_URL` in your system environment. See [Configuring WEM](../installing/wem#configuring-wem) and [Configuring WEM settings post-installation](../get-started#configuring-wem-settings) for details. + +### Identifying alert sources + +Alerts are automatically generated from several monitoring vectors: +- **Canary check failures:** Triggered when automated SQL probes fail or exceed latency thresholds. +- **Segment down events:** Triggered if a segment becomes unreachable or enters a recovery state. +- **Resource threshold breaches:** Fired when CPU, Memory, or Disk Usage cross predefined limits. +- **System errors:** Critical database engine events captured from the WHPG log stream. +- **WEM outages**: If Prometheus is unable to reach the WEM service, it triggers an alert. + +### Understanding severity levels + +WEM displays severity levels to help you prioritize your operational workflow: +- **Critical:** Indicates a severe failure or a total loss of service. These require immediate attention. +- **Warning:** Highlights performance degradation or resource pressure. These must be investigated to prevent escalation. +- **Info:** Routine informational notices regarding system changes or successful task completions. + + +### Managing the incident lifecycle +Use the specialized tabs to move through the stages of alert detection, suppression, and resolution. +- **Respond to current threats:** Use the **Active Alerts** tab to identify and prioritize immediate issues. Filter by severity to address critical failures first, ensuring that total service outages are resolved before investigating warning or info events. +- **Suppress noise during maintenance:** Use the **Silences** tab to temporarily mute specific alerts. This is essential during scheduled maintenance or segment recovery windows to prevent alert fatigue and ensure that your notification channels remain focused on unexpected issues. +- **Audit dispatch history:** Review the **Notifications** tab to see exactly when and where alerts were sent (e.g., Slack, Email, or PagerDuty). Use this to verify that the correct stakeholders were notified during an incident. +- **Evaluate detection logic:** Browse the **Alert Rules** tab to inspect the active triggers defined in your Prometheus configuration. This view allows you to verify the technical conditions (thresholds, durations, and labels) that govern how WEM identifies system degradation. +- **Perform retrospective analysis:** Use the **Alert History** tab to identify recurring patterns. By auditing resolved alerts, you can isolate intermittent hardware failures or recurring resource pressure that might require long-term capacity planning. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/cluster-overview.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/cluster-overview.mdx new file mode 100644 index 0000000000..e6624f6087 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/cluster-overview.mdx @@ -0,0 +1,39 @@ +--- +title: Verifying the cluster health +navTitle: Verifying the cluster health +description: Use the Cluster Overview panel to monitor real-time WarehousePG cluster health, verify node availability, and track critical connectivity metrics to ensure high availability +deepToC: true +--- + +The **Cluster** panel on the left sidebar provides a high-level summary of the WarehousePG (WHPG) cluster configuration and real-time health metrics. This panel is the primary starting point for verifying cluster availability and resource utilization. + +### Confirming core cluster availability + +Use the top-level summary cards to obtain an overview of the state of your cluster. Focus on these three metrics to ensure basic service delivery: + +- **Check operational status:** Verify the overall status shows as "Healthy." If the state is "Degraded," it indicates that one or more segments have failed or that synchronization is lagging. +- **Track segment uptime:** Ensure the count of "Up" segments matches your total segment count. Any "Down" segments represent a loss of data redundancy or processing power. +- **Monitor connection headroom:** Compare current active connections against the maximum limit. If connections are near the ceiling, new application requests will be rejected. + +### Validating coordinator and standby sync + +The coordinator is the entry point for all queries. Use this section to ensure the control plane is resilient: + +- **Verify the coordinator state:** Confirm the primary coordinator host is up. If it is down, application traffic cannot reach the database. +- **Monitor the replication mode:** Check that the standby coordinator is "Synchronized." If the mode shows as "Not Synced", a failover event could result in data loss or extended downtime. + + +### Auditing segment and mirror configuration + +WarehousePG relies on a distributed architecture where primary segments handle the work and mirrors provide safety. Use the segment table to perform these actions: + +- **Identify failed primary nodes:** Search the table for any primary segments with a "Down" status. In a healthy cluster, every primary should be active. +- **Verify failover readiness:** Ensure all mirror segments are "Up" and in their correct roles. If a mirror is down, the associated primary segment is running without a safety net. +- **Localize network issues:** Review the Hostname and Port columns to determine if a specific physical host is responsible for multiple segment failures. + +### Analyzing database utilization + +Identify which specific databases are consuming cluster resources to prevent one tenant from impacting others: + +- **Compare database sizes:** Identify rapidly growing databases that might require storage expansion or data vacuuming. +- **Track session distribution:** Monitor connection counts per database to identify unauthorized access or application connection leaks. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/index.mdx new file mode 100644 index 0000000000..31bc2fa4e7 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/index.mdx @@ -0,0 +1,24 @@ +--- +title: Observing system events +navTitle: Observing system events +description: Use WarehousePG Enterprise Manager as a centralized hub for monitoring, managing, and optimizing WarehousePG clusters. +navigation: +- cluster-overview +- system-metrics +- monitoring +- logs +- alerts +--- + +Verify cluster health, maintain operational awareness and respond to system events in real time through the following core actions: + +- [Verifying the cluster health:](cluster-overview) Maintain a real-time view of the cluster’s topology and health via the **Cluster** panel. Use this to ensure that coordinator, standby, and segment nodes are online and correctly configured. + +- [Visualizing hardware performance:](system-metrics) Use the **System Metrics** panel to tracking the physical health of your infrastructure. Use these charts to identify OS-level bottlenecks, such as CPU spikes, memory exhaustion, or network latency across specific hosts. + +- [Validating database responsiveness:](monitoring) Ensure the database engine is actively processing requests. Use the **Monitoring** panel to review automated Canary checks—synthetic SQL probes that verify connectivity and execution speed. + +- [Auditing system logs:](logs) The **Logs** panel allows you to investigate the unified stream of system and database telemetry. Search through coordinator and segment logs to pinpoint the root cause of query failures or administrative changes. + +- [Managing alerts:](alerts) Use the **Alerts** panel to integrate with Prometheus Alertmanager and govern the incident lifecycle through real-time notifications. + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/logs.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/logs.mdx new file mode 100644 index 0000000000..ae33c55779 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/logs.mdx @@ -0,0 +1,35 @@ +--- +title: Auditing system logs +navTitle: Auditing system logs +description: Use the Log panel to access, filter, and analyze system and database telemetry through integrated log viewers. +deepToC: true +--- + +The **Logs** panel on the left sidebar serves as a centralized diagnostic hub. By consolidating internal database metrics with external log aggregation, it allows you to correlate system-level events with specific query failures. + +### Understanding log levels + +WarehousePG Enterprise Manager (WEM) displays severity levels generated directly by the underlying WarehousePG (WHPG) engine. These levels categorize every log entry based on its impact on database operations: +- `DEBUG`: Contains granular technical details used primarily for deep-dive troubleshooting and development analysis. +- `INFO`: Provides standard informational messages regarding routine system operations. +- `LOG`: Reports standard engine-level events and process completions. +- `WARNING`: Highlights events that are not fatal but could indicate potential configuration issues or approaching resource limits. +- `ERROR`: Reports a problem that prevented a specific command or query from completing successfully. +- `FATAL`: Indicates an error that caused a specific session to be terminated, though the rest of the database remains operational. +- `PANIC`: Indicates a critical error that caused all database sessions to be disconnected; the system will usually attempt a restart after a `PANIC`. + +### Performing structured database analysis + +Use the **WHPG Log Tables** tab to query internal database records for specific historical events. This is the primary method for investigating SQL errors and session-level failures. +- **Filter by severity impact:** Narrow your search to critical event levels like `ERROR`, `FATAL`, or `PANIC` to bypass routine system noise. Use these logs to identify commands that failed or sessions that were terminated prematurely. +- **Isolate specific actors and environments:** Filter results by user, database, or session ID. This allows you to determine if a performance issue is widespread or isolated to a single application service or developer account. +- **Investigate technical error context:** Select the **Details** button in the actions column to view the full technical trace. Use the session ID, PID, and the specific source code file/line reference to pinpoint exactly where a query failed. +- **Debug prepared statements:** Review the detail field in the log details modal to see bound parameters. This is essential for reproducing errors that only occur with specific input data. +- **Facilitate technical support and archiving:** Select the **Export CSV** button to download your filtered results. This file is the primary resource to provide to technical support for deeper investigation. It is also ideal for long-term compliance archiving or performing bulk analysis in external tools. + +### Tracking live system logs + +Use the **Loki Logs** tab for high-speed, full-text searching across the entire cluster infrastructure in real time. +- **Stream live system events:** Watch logs as they are generated to observe the immediate impact of configuration changes or application deployments. +- **Navigate to specific incidents:** Use the visual time-picker to jump to a specific moment in time when a system alert was triggered. This helps you see exactly what was happening across the cluster during a hardware spike or network interruption. +- **Search across the infrastructure:** Utilize Loki’s optimized search engine to perform broad keyword searches (like "timeout" or "refused") across all nodes simultaneously, rather than checking individual host tables. diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/monitoring.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/monitoring.mdx new file mode 100644 index 0000000000..149e06374c --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/monitoring.mdx @@ -0,0 +1,20 @@ +--- +title: Validating database responsiveness +navTitle: Validating database responsiveness +description: Use the Monitor panel to track proactive health indicators and automated canary check results to ensure database availability. +deepToC: true +--- + +The **Monitoring** panel on the left sidebar provides proactive verification of cluster health through automated Canary checks. Unlike passive metrics, these checks execute active tasks to ensure the database engine is responding correctly and meeting performance baselines. + +### Performing proactive health checks: Canary checks + +Canary checks are recurring, automated scripts that simulate real-world operations to verify the end-to-end integrity of the system. You can configure these tests on the [Management panel](../system-access/management). + +Use the **Canary Checks** tab to verify that the database can successfully execute core operations. +- **Assess overall probe health:** Review the header metrics to get an instant snapshot of system integrity. Compare the **Successful** count against the **Total Checks** count to identify if a specific subset of your monitoring is failing. +- **Verify scheduler activity:** Check the status of the scheduler to ensure it is running. This confirms that the WarehousePG Enterprise Manager (WEM) engine is actively triggering your background probes. If the scheduler is stopped, your health data will become stale and you will lose proactive visibility. +- **Run health checks on demand:** Trigger an immediate execution of any check in the list to verify a fix or test real-time connectivity. While you must go to the [Management panel](../system-access/management) to create or edit a check, the **Monitoring** panel allows you to run and stop them at any time to get an instant status update. +- **Benchmark execution speed:** Monitor the **Average Duration** metric to establish a baseline for expected responsiveness. A sudden spike in this metric, even if checks are still successful, serves as an early warning of resource saturation or network latency. +- **Investigate specific check failures:** Audit the **Health Checks** table to isolate the root cause of a failure. By checking which specific probe is non-passing, you can determine if the issue is a total service outage or a localized subsystem failure. + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/system-metrics.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/system-metrics.mdx new file mode 100644 index 0000000000..231914a60c --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/monitoring/system-metrics.mdx @@ -0,0 +1,42 @@ +--- +title: Visualizing hardware performance +navTitle: Visualizing hardware performance +description: Use the System Metrics panel to track physical host metrics, identifying resource bottlenecks, and correlating hardware spikes with database activity. +deepToC: true +--- + +The **System Metrics** panel on the left sidebar provides a comprehensive view of the hardware and database performance telemetry across the WarehousePG (WHPG) cluster. This data is essential for identifying resource saturation, diagnosing performance degradation, and performing capacity planning. + +### Monitoring real-time host health + +Use the **System Metrics** tab to identify immediate hardware saturation that could be impacting query response times. + +- **Spot processing bottlenecks:** Check the **CPU Usage % Over Time** to see if specific nodes are hitting 100% utilization. If one node is consistently higher than others, you could have data skew issues. +- **Assess memory pressure:** Monitor **Available Memory** vs. **Cached Memory**. If the available memory is low and cached is also shrinking, the OS is under pressure and could start swapping, which significantly slows down database operations. +- **Validate storage and network throughput:** Review **Disk I/O** and **Network Traffic** graphs. High disk read rates during unexpected times might indicate inefficient queries that are forcing full table scans instead of using indexes. +- **Evaluate system load averages:** Observe the 1m, 5m, and 15m load averages. If the 15-minute load consistently exceeds the number of available CPU cores, the host is over-provisioned and tasks are queuing at the OS level. + +### Analyzing historical trends and capacity + +Use the **Historical Trends** tab to move beyond immediate troubleshooting and look for long-term patterns in your hardware utilization. + +- **Forecast hardware upgrades:** Compare average and peak usage for CPU and memory over the last 30 days. If your peak usage is steadily climbing toward your total capacity, it is time to plan for node expansion. +- **Identify hardware outliers:** Review the per-host statistics table. Use the standard deviation (**Std Dev**) metric to find nodes that behave differently than the rest of the cluster, which could indicate failing hardware or localized configuration issues. +- **Correlate combined I/O activity:** Use the **Combined I/O Activity** graph to see if spikes in network traffic happen at the same time as disk writes. This often points to massive data redistributions or heavy background maintenance tasks. + +### Correlating database activity with hardware load + +Use the **Database Metrics** tab to bridge the gap between SQL execution and physical resource consumption. + +- **Manage connection health:** Review the status bar for **Idle in Txn** and **Blocked** sessions. Sessions that stay **Idle in Txn** prevent the database from cleaning up old data, leading to table bloat and wasted disk space. +- **Optimize memory efficiency:** Check the **Cache Hit %** for each database. If this number drops significantly below 90%, it means your data "working set" is too large for the current memory allocation, forcing the system to read from slow disks. +- **Identify resource-intensive databases:** Compare database sizes and query activity trends. If a small database is generating a disproportionately high number of temporary files, its queries likely need better indexing or more memory for sorting. +- **Investigate transaction failures:** Monitor the ratio of rollbacks and deadlocks in the **Database Statistics** table. A sudden spike in rollbacks often indicates application-level errors or network instability between the application and the cluster. + +### Responding to low storage alerts + +If disk utilization metrics or historical trends indicate that storage is running low, perform the following sequence to restore headroom: +- **Review large tables:** Navigate to the **Data analysis** panel to identify which specific tables are consuming the most space. Focus on those with the highest growth rates. +- **Check for database bloat:** Investigate if tables or indexes have accumulated excessive bloat. Dead tuples that haven't been reclaimed by vacuuming can consume significant storage without holding actual data. +- **Archive legacy data:** Consider moving older, less frequently accessed data to cold storage or an archival schema to free up primary disk space. +- **Initiate capacity planning:** Contact your DBA to discuss hardware expansion or volume resizing if the current data growth exceeds the physical limits of the existing nodes. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/overview/architecture.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/overview/architecture.mdx new file mode 100644 index 0000000000..1a8b83173f --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/overview/architecture.mdx @@ -0,0 +1,55 @@ +--- +title: WarehousePG Enterprise Manager architecture +navTitle: Architecture +description: Overview of the WarehousePG Enterprise Manager architecture and its core components. +deepToC: true +--- + + +The WarehousePG Enterprise Manager (WEM) architecture is built upon a streamlined telemetry pipeline: an internal data collection layer (Collector), dedicated storage services (Prometheus and Loki), and the unified WEM application which now includes a built-in Exporter engine. + +![architechure](../images/architecture.svg) + +## Components + +### Collector + +The Collector is a service based on Grafana Alloy that runs on the WarehousePG (WHPG) coordinator, standby, and segments. Each Collector service on the WHPG cluster collects host metrics and log files, and sends them to the Collector service on the coordinator. The Collector service on the coordinator temporarily stores these metrics in memory, then pushes the host metrics to Prometheus, and the log files to Loki. + +### WEM + +WEM is the central management and visualization service. It includes the Exporter as a native service, eliminating the need for a separate installation. + +WEM performs the following core functions: + +- **Database extraction:** The integrated Exporter engine runs SQL queries against heap and catalog tables to capture deep database metrics. +- **Data routing:** Pushes captured SQL and cluster metrics to Prometheus for historical analysis. +- **Unified visualization:** Aggregates live data directly from the internal Exporter engine and historical data from Prometheus and Loki into a single dashboard. + +### Storage services: Prometheus and Loki + +WEM leverages industry-standard storage engines to handle high-velocity telemetry. You can deploy dedicated instances for WEM or integrate with your existing enterprise monitoring stack: + +- **Prometheus:** The time-series database for all numerical data. It receives host metrics from the Collector and SQL/Cluster metrics from the internal WEM Exporter. +- **Loki:** The log aggregation engine. It receives high-volume log streams directly from the Collector on the coordinator. + + +!!! Note + While WEM can function as a standalone tool without Prometheus and Loki, its capabilities will be limited to real-time cluster status and SQL execution data; historical trends, host-level metrics, and log aggregation require the external storage services. + +## WEM operational workflow + +The system processes telemetry across four distinct phases: + +1. **Collection:** Collector agents on every node harvest raw OS metrics and log entries, and the data is tunneled to the coordinator node. + +2. **Export & routing:** + - **System & logs:** The Collector on the coordinator pushes system metrics to Prometheus and logs to Loki. + - **Database metrics:** The internal WEM Exporter engine probes the WHPG engine to capture the state of queries, transactions, and resource usage. + +3. **Storage:** Data is indexed and stored externally within Prometheus and Loki. This ensures that even if the database cluster faces downtime, its historical metrics and logs remain available for root-cause analysis. + +4. **Visualization:** WEM assembles the operational picture by pulling from three sources: + - **Prometheus:** For hardware performance and historical SQL trends. + - **Loki:** For searchable log files across the cluster. + - **Exporter:** For the real-time view of active sessions and current cluster status. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/overview/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/overview/index.mdx new file mode 100644 index 0000000000..be2b94d171 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/overview/index.mdx @@ -0,0 +1,32 @@ +--- +title: Overview of WarehousePG Enterprise Manager +navTitle: Overview +description: Learn about the architecture, supported platforms, and operational considerations for WarehousePG Enterprise Manager. +navigation: +- architecture +- supported_platforms +- known_issues +--- + +WarehousePG Enterprise Manager (WEM) is the centralized management and monitoring hub for your WarehousePG (WHPG) clusters. It replaces fragmented command-line tools with a unified web interface that combines real-time observability, AI-assisted development, and interactive cluster administration. + +Whether you are a DBA tuning performance or a developer writing complex distributed queries, WEM provides the insights and safety nets required to run WarehousePG at scale. + +## Why WEM? + +Managing a distributed architecture across multiple segment nodes is inherently complex. WEM simplifies this by providing: + +- **Consolidated visibility:** Stop jumping between nodes. View host metrics, SQL statistics, and system logs in one place. + +- **Built-in intelligence:** Use the AI Assistant to explain execution plans and optimize distributed queries. + +- **Proactive health:** Identify data skew, monitor resource bottlenecks, and run automated Canary checks. + +## Explore WEM + +To get started with WEM, familiarize yourself with the following core sections: + +- [Architecture:](architecture) Understand how WEM interacts with your WHPG cluster. This section covers the three-tier model: the Collector, the storage layer, and WEM. + +- [Supported platforms:](supported_platforms) Check the compatibility matrix for WEM. This includes supported Linux distributions and compatible WarehousePG versions. +- [Known issues & limitations:](known_issues) Review the current status of the latest release. This section tracks current limitations and provides any available workarounds. diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/overview/known_issues.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/overview/known_issues.mdx new file mode 100644 index 0000000000..b5ca778246 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/overview/known_issues.mdx @@ -0,0 +1,34 @@ +--- +title: Known Issues +navTitle: Known Issues +description: Learn about known issues in WarehousePG Enterprise Manager version 0.5. +--- + +These are the currently known issues and limitations identified in the WarehousePG Enterprise Manager (WEM) release. Where applicable, we have included workarounds to help you mitigate the impact of these issues. These issues are actively tracked and are planned for resolution in a future release. + +### Deployment & infrastructure + +- Each WEM deployment is designed to manage and monitor only one WarehousePG (WHPG) cluster. Managing multiple clusters requires separate, dedicated WEM installations for each. +- WEM cannot be installed on Red Hat Enterprise Linux 7. Note that this restriction applies only to the host running the WEM application; WEM is fully capable of monitoring WHPG clusters running on RHEL 7 nodes. +[PTT-703]: # +- Running WHPG and WEM on the same host using the same database is not supported. WEM writes its application state into the `observability` database; sharing this environment can lead to configuration conflicts and upgrade complexity. +[PTT-714]: # +- The maximum Postgres connection pool size is currently hardcoded to 5. There is currently no mechanism to scale or configure this limit via environment variables. + + +### Backup management + +[PTT-723]: # +- The **Backups** panel is only supported when WEM is installed directly on the WHPG coordinator node. +[PTT-682]: # +- The **Find Table** search tab within the **Backup** panel is currently non-functional. + +### Query editor + +[PTT-712]: # +- The **Query Editor** could become unresponsive when executing multiple concurrent queries. This issue is under active investigation. + +### Data analysis + +[PTT-701]: # +- Tables in the **Data Analysis** panel are subject to a hardcoded minimum size filter. Tables falling below this threshold will not be displayed in the interface. diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/overview/supported_platforms.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/overview/supported_platforms.mdx new file mode 100644 index 0000000000..db2aa15720 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/overview/supported_platforms.mdx @@ -0,0 +1,17 @@ +--- +title: Supported platforms +navTitle: Supported platforms +description: Provides information for determining the platform support for WarehousePG Enterprise Manager. +--- + +## Supported WarehousePG versions + +WarehousePG Enterprise Manager (WEM) 0.5 is compatible with the following versions of WarehousePG (WHPG): + +- WHPG version 6.x running on RHEL 7 or RHEL 8. +- WHPG version 7.x running on RHEL 8 or RHEL 9. + + +## Supported platforms + +- The host running WEM must be running on RHEL 8 or RHEL 9. This limitation applies only to where the WEM application itself is deployed; WEM is fully capable of monitoring WHPG clusters that are running on RHEL 7 nodes. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/performance/backups.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/performance/backups.mdx new file mode 100644 index 0000000000..4fd167935c --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/performance/backups.mdx @@ -0,0 +1,34 @@ +--- +title: Securing backups +navTitle: Securing backups +description: Use the Backups panel to monitor data protection lifecycles, audit snapshot health, and verify recovery point objectives. +deepToC: true +--- + +The **Backups** panel on the left sidebar serves as a centralized monitoring hub for your data protection lifecycle. Use these actions to verify the integrity of your snapshots and ensure that your recovery point objectives (RPO) are being met across the WarehousePG cluster. + +!!! Note + This panel requires a functional `gpbackup_manager` utility configuration to display data. + + This interface is for monitoring and auditing only; it cannot be used to start, schedule, delete, or restore backups. For these operations, please contact your system administrator to perform them via the server-side cluster tools. + +### Auditing backup history and health + +Use the **Backup List** tab to maintain a chronological record of all data protection operations performed via the `gpbackup_manager` utility. +- **Verify recovery point objectives:** Review the timestamp and database columns to ensure that backups are occurring at the required intervals. Frequent successful snapshots ensure that you can restore data to a recent point in time in the event of a failure. +- **Monitor backup performance:** Observe the duration and size metrics. A sudden increase in backup duration or a significant drop in backup size could indicate network congestion, storage bottlenecks, or that specific large tables were excluded from the run. +- **Interpret operational status:** Regularly check the status column to identify gaps in protection. If a backup is marked as "Failed," immediately transition to the auditing system logs page to identify which segment or network path caused the interruption. + +### Inspecting backup metadata and scope + +When you select a specific entry from the backup history, use **Metadata** to validate exactly what data was secured and how it was stored. +- **Confirm object coverage:** Review the object scope to see which schemas and tables were included or excluded. This is critical for ensuring that new production tables haven't been accidentally omitted from the backup routine. +- **Review storage and compression settings:** Verify the storage targets and compression levels. Higher compression reduces storage costs but could extend the recovery time during a restoration process. + +### Navigating advanced backup tools + +Use the following tabs to locate specific data within your backup archive and analyze long-term protection trends. +- **Find specific tables in archives:** Use the **Find Table** tab to search through historical snapshots for a specific object. This allows you to identify exactly which backup ID contains the version of a table you need to recover. +- **Analyze protection statistics:** Review the **Statistics** tab to see aggregate data on backup success rates and storage consumption over time. Use these trends to forecast when your backup storage destination will require more capacity. +- **Review backup reports:** Access the **Backup Report** tab for a summarized view of the most recent operations, including detailed exit codes and summary logs generated by the `gpbackup_manager` utility. + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/performance/data-analysis.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/performance/data-analysis.mdx new file mode 100644 index 0000000000..daec717c20 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/performance/data-analysis.mdx @@ -0,0 +1,37 @@ +--- +title: Analyzing data distribution +navTitle: Analyzing data distribution +description: Use the Data Analysis panel to explore database structures, monitor storage health, and optimize performance metrics. +deepToC: true +--- + +The **Data Analysis** panel on the left sidebar is the primary interface for auditing database structures and diagnosing storage-related performance issues. Use these actions to maintain schema health and ensure data is distributed efficiently across the cluster. + +### Auditing table and object inventory + +Use the **Tables** tab to monitor the scale of your relational data and verify that storage settings are optimized for analytical workloads. +- **Validate compression efficiency:** Observe the **Compression** and **Level** columns. If the compression ratio is low for a large table, consider adjusting the algorithm (e.g., switching to `zstandard`) to reclaim disk space. +- **Check metadata freshness:** Monitor the **Last Analyze** timestamp. If a table has not been analyzed recently, the query planner might use stale statistics, leading to inefficient execution plans. +- **Manage external data services:** Use the **External Tables** tab to oversee data residing in S3 or HDFS. + + +### Optimizing performance through indexing and partitioning + +Use the **Indexes** and **Partitions** tabs to ensure your data structures support rapid query execution and simplified data lifecycles. +- **Update stale statistics:** Navigate to the **Missing Stats** tab to identify tables that haven't been analyzed in over seven days. + +!!! Tip + Statistics help the query planner make optimal decisions. Manually run `ANALYZE` after any operation that modifies more than 10% of a table's data to ensure optimal query plans. + +- **Reclaim wasted disk space:** Review the **Bloat** tab to find tables with a high dead count. If bloat exceeds 20%, consider running a manual `VACUUM` to mark dead space for reuse. +- **Resolve data distribution hot spots:** Use the **Data Skew** tab to find tables where data is unevenly spread across segments. A high skew % indicates that a single segment is doing more work than others, slowing down the entire cluster. +- **Address table skew:** If a large table shows significant skew, investigate the distribution key. Consider using `ALTER TABLE ... SET DISTRIBUTED BY` to choose a column with higher cardinality (more unique values) or fewer nulls. + +### Visualizing storage strategy + +Use the **Charts** tab to get a high-level view of your database’s physical composition and identify long-term storage trends. +- **Prioritize archival candidates:** Review the **Top 50 Tables by Size** bar chart to see which objects could be candidates for partitioning or data archiving. +- **Audit storage formats:** Check the **Storage Format Distribution** pie chart. If a majority of your data is in Heap format, plan a migration to append-only storage to optimize for high-volume analytical reads. + + + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/performance/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/performance/index.mdx new file mode 100644 index 0000000000..c3341cda44 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/performance/index.mdx @@ -0,0 +1,23 @@ +--- +title: Optimizing system performance +navTitle: Optimizing system performance +description: Use WarehousePG Enterprise Manager to tune workloads, manage resource allocation, and ensure data durability across WarehousePG clusters. +navigation: +- query-monitor +- managing-resources +- data-analysis +- storage +- backups +--- + +Tune the workload and infrastructure to ensure consistent, high-speed execution and system continuity. + +- [Monitoring and evaluating queries:](query-monitor) Use the **Query Monitor** panel to track real-time SQL execution to identify bottlenecks and use the editor to test query logic or performance improvements. + +- [Managing system resources:](managing-resources) The **Resource Management** panel allows you to control CPU and memory boundaries via resource groups and resource queues to prevent contention. + +- [Analyzing data distribution:](data-analysis) Investigate table statistics, data skew, and bloat to improve query execution plans via the **Data Analysis** panel. + +- [Planning storage capacity:](storage) Use the **Storage** panel to monitor disk utilization and historical growth trends across all hosts to prevent "disk full" events. + +- [Securing backups:](backups) Manage the status and history of backup operations to ensure reliable recovery points with the **Backups** panel. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/performance/managing-resources.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/performance/managing-resources.mdx new file mode 100644 index 0000000000..d3bbcc568b --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/performance/managing-resources.mdx @@ -0,0 +1,58 @@ +--- +title: Managing system resources +navTitle: Managing system resources +description: Use the Resource Groups panel to configure and monitor CPU, memory, and concurrency limits to optimize workload performance for resource groups and resource queues. +--- + +The **Resource Management** panel on the left sidebar allows you to govern cluster performance by controlling how system resources are distributed across different workloads. + + +WarehousePG (WHPG) provides two primary strategies for resource management: resource groups and resource queues. They are controlled by the `gp_resource_manager` server configuration parameter. See [Managing resources](/supported-open-source/warehousepg/warehousepg/admin_guide/wlmgmt/) from the WHPG documentation for details about how WHPG provides resource management. + +The interface automatically adapts based on two factors: + +1. The value of `gp_resource_manager`: + - `group`: Displays the Resource Groups interface. + - `queue`: Displays the Resource Queues interface. + - `none` (WHPG 7+): Disables resource management. In this mode, the panel will display a status message indicating that no governance is active. + +2. Your WHPG major version: +The configuration options differ between WHPG 6 and WHPG 7 to reflect changes in the underlying engine's capabilities. + +## Governing workloads with resource groups + +Use this section when `gp_resource_manager` is set to `group`. Resource groups use Linux Control Groups (cgroups) to provide hard limits on CPU and memory. WEM automatically adjusts the **Configuration** tab to match the capabilities of your WHPG version. + +- **Configure group limits:** + - **WHPG 7 settings:** + - **CPU Max %:** Sets a hard limit on CPU usage. The group cannot exceed this even if the cluster is idle. + - **CPU Weight:** Determines the relative share of CPU the group receives when there is contention. + - **Memory Quota:** Sets the specific amount of RAM allocated to the group. + - **Concurrency:** Limits the number of simultaneous active queries. + - **Min Cost:** Defines the minimum query cost required before resource group limits are applied. + - **WHPG 6 settings:** + - **CPU Rate %:** Sets the percentage of CPU resources allocated to the group. + - **Memory %:** Defines the percentage of total available memory dedicated to the group. + - **Shared Quota %:** Sets the percentage of memory that can be shared among multiple transactions in the group. + - **Spill Ratio:** Controls the threshold at which memory-intensive operations (like sorts or joins) begin spilling data to disk. + - **Concurrency:** Limits the number of simultaneous active queries. + +- **Monitor group status:** Review the **Status** tab to see real-time consumption. Identify groups with a high queue depth, which indicates that queries are waiting for an available execution slot because the group's concurrency or memory limits have been reached. +- **Map roles to groups:** Use the **Role Assignments** tab to link specific database users to their appropriate resource groups. This ensures that high-priority production ETL processes remain isolated from ad-hoc user queries. +- **Visualize resource distribution:** Check the **Distribution** charts for a high-level view of your allocation. Use the **CPU Max %** (WHPG 7) or **CPU Rate %** (WHPG 6) distribution pie chart to verify that your most critical groups are granted the intended share of the cluster's processing power. + + +### Governing workloads with resource queues + +Use this section when `gp_resource_manager` is set to `queue`. This strategy is available on both WHPG 6 and 7, but uses a different management logic centered on query "cost" rather than OS-level cgroups. + +- **Track active and queued queries:** Use the **Status** tab to monitor the count of **Active** vs. **Queued** queries. If the queued count is high, your resource queues are likely too restrictive or your cluster lacks the processing power to handle the current concurrent load. +- **Analyze cost-based limits:** Review the cost limits assigned to each queue. Unlike groups, queues rely on the estimated "cost" of a query (from the optimizer) to determine if it should be allowed to run immediately or be placed in a wait state. +- **Identify lock contention:** Check for queries in a "Waiting" state. In a queue-based system, this often points to long-running transactions holding resources and blocking other users in the same queue. + +### Troubleshooting waiting queries + +Regardless of the management strategy or version, use these steps to resolve resource-based delays: + +- **Identify limit breaches:** Determine if the delay is caused by CPU caps, memory shortages, or concurrency ceilings. If a group or queue is consistently hitting its limits, contact an administrator to adjust the values in the **Configuration** tab. +- **Evaluate query cost:** Check the **Min Cost** setting (or the **Cost thresholds** in queues). Queries with a cost lower than this threshold will bypass limits, ensuring small, administrative tasks always run instantly. diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/performance/query-monitor.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/performance/query-monitor.mdx new file mode 100644 index 0000000000..f3d2968efb --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/performance/query-monitor.mdx @@ -0,0 +1,75 @@ +--- +title: Monitoring and evaluating queries +navTitle: Monitoring and evaluating queries +description: Use the Query Monitor panel to monitor active workloads, execute SQL, and leverage AI-driven optimization tools. +deepToC: true +--- + +The **Query Monitor** panel on the left sidebar provides a real-time view of every session connected to the WarehousePG (WHPG) cluster. By combining live monitoring with an integrated SQL workbench and AI-assisted diagnostic tools, it allows you to maintain peak cluster performance. + +!!! Note + Access to the **Query Monitor** and its administrative functions is governed by user roles, ensuring that session management capabilities are restricted to authorized personnel. + - Admin/Operator: Full visibility, plus the ability to **Cancel Query** or **Terminate Session**. + - Viewer: Read-only access to all monitoring tabs; cannot stop or modify sessions. + + Refer to the [Role permissions matrix](../reference#role-permissions-matrix) for details. + +### Monitoring active cluster workloads + +Use the **Active Queries** tab to track execution progress and identify resource-heavy statements that may be impacting system stability. +- **Check connection distribution:** Observe the status bar to see how connections are distributed. Focus on sessions that are **Idle in transaction** or **Waiting**, as these typically indicate application-level leaks or locking contention that requires intervention. +- **Identify memory spill activity:** Review the **Spill Activity** section to identify queries forced to use disk space for processing. High **Spill Size** or **Temp Files** counts indicate that specific queries are exceeding their memory limits and need optimization. +- **Search and filter queries:** Use the **Advanced Search** to isolate queries based on duration, specific users, or databases. This is the fastest way to find queries that has been executing for an unexpectedly long time. +- **Manage runaway queries:** If you have Admin or Operator privileges, use the **Cancel** query tool to gracefully stop a statement that is consuming excessive resources. Use the **Export** feature to save a snapshot of current activity for later performance audits. + +**Understanding query statuses** + +The **Active Queries** tab features a status bar that provides a real-time count for each connection state. These metrics offer an immediate snapshot of cluster health and help you quickly identify if the system is under heavy load or if connections are piling up in a specific state: +- **Active:** Queries currently being processed by the CPU. Track the execution time to ensure it aligns with expected performance baselines. +- **Idle:** Established connections waiting for the next command. No immediate action is required; this is standard behavior for established connections. +- **Idle in Transaction:** Open transactions waiting for input; these must be monitored as they can prevent vacuuming and cause table bloat. +- **Transaction (abort):** Transactions that have encountered an error and are currently in an aborted state. The session needs to be rolled back or terminated to release system locks and resources. +- **Fastpath:** Sessions executing internal fast-path function calls. +- **Disabled:** Connections that have been administratively disabled or are currently restricted from executing new database operations. +- **Waiting:** Queries blocked while waiting for locks or system resources. + +### Auditing database sessions + +Use the **Sessions** tab to identify connection leaks and manage dormant processes that are consuming system slots. +- **Pinpoint dormant connections:** Sort the list by **Idle time** to find connections that have been open for long periods without activity. These dormant sessions can prevent system maintenance tasks like vacuuming. +- **Identify client application sources:** Review the **Application** column to see which tools (such as psql, pgadmin, or etl drivers) are initiating connections. This helps you identify which specific service might be responsible for a connection spike. +- **Terminate problematic sessions:** If a session is unresponsive or holding critical locks, use the **Terminate** action to forcefully close the connection and release all associated system resources. + +### Testing and optimizing queries + +Use the **Query Editor** tab as an interactive workbench to safely explore data and analyze execution plans in a read-only environment. +- **Generate execution plans:** Use the **EXPLAIN** and **ANALYZE** buttons to visualize how the engine intends to process a query. Review the motion analysis to see how data is redistributed, broadcast, or gathered across segments. +- **Identify performance bottlenecks:** Check the warnings provided in the execution plan. Focus on "sequential scans" on large tables or "missing statistics," which are common causes of slow performance. +- **Format and refine:** Use the **Format SQL** button to clean up raw sql for better readability and the example library to quickly pull templates for common administrative queries. +- **Export CSV:** Once a query successfully executes, you can download the entire result set as a CSV file. This allows for easy data portability into spreadsheets or external reporting tools. + + +!!! Important + To ensure system safety, all queries executed through the **Query Editor** run in read-only mode. Use dedicated database tools for write operations if needed. + + +### Leveraging the AI Assistant + +Use the integrated AI Assistant to accelerate sql authoring and simplify the debugging of complex performance issues. + +!!! Note + This is an optional feature. An administrator must configure an `ANTHROPIC_API_KEY` for the assistant to be active. See [Configuring WEM](../installing/wem#configuring-wem) and [Configuring WEM settings post-installation](../get-started#configuring-wem-settings) for details. + +- **Generate queries from natural language:** Press `Ctrl + K` to ask the assistant to write a query for you using plain English. Because the assistant is schema-aware, it will reference your actual table and column names accurately. +- **Optimize slow-running statements:** Paste a slow query into the assistant and ask for optimization suggestions. The assistant will analyze join efficiency, index usage, and cluster resource utilization to recommend a more efficient version of your code. +- **Debug database errors:** When a query fails, provide the postgres error message to the assistant. It will explain the failure in plain language and suggest the specific sql corrections needed to resolve the error. + +!!! Note + While the AI Assistant can suggest any SQL command (including DDL/DML), execution is strictly governed by your role. For example, Viewer roles are restricted to executing `SELECT` statements only. + +### Reviewing execution history + +Use the **Results** subtab within the **Query Editor** tab to manage your recent activity and retrieve data without re-executing heavy statements. +- **Retrieve recent data:** View results from queries run in the last 24 hours to re-examine data grids and success indicators without placing a new load on the cluster. +- **Refine past queries:** Use the **Open in Editor** action to reload a previous statement for further tuning. This is ideal for iterative development of complex analytical queries. + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/performance/storage.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/performance/storage.mdx new file mode 100644 index 0000000000..4cab74cb20 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/performance/storage.mdx @@ -0,0 +1,36 @@ +--- +title: Planning storage capacity +navTitle: Planning storage capacity +description: Use the Storage panel to monitor disk capacity, usage distribution, and historical growth across the WarehousePG cluster. +deepToC: true +--- + +The **Storage** panel on the left sidebar provides real-time and historical visibility into disk utilization across the WarehousePG (WHPG) cluster. Use these actions to monitor current capacity, isolate hardware-specific imbalances, and forecast future storage requirements. + +### Monitoring cluster-wide capacity + +Use the **Overview** tab to perform a high-level audit of your total storage footprint and ensure data is spread evenly across the cluster infrastructure. +- **Evaluate total headroom:** Check the capacity metrics for total **Used** vs. **Free** space. If the average usage percentage across the cluster exceeds 70%, begin identifying data archival candidates. +- **Identify storage imbalances:** Review the **Free Space Distribution** visualization. If certain nodes have significantly less free space than others, it usually indicates data skew at the database level that requires re-distribution. +- **Audit node-level partitions:** Use the **Storage Details** table to inspect specific mount points (e.g., `/data` vs. `/var/log`). This helps you determine if storage pressure is being caused by database files or by expanding system logs and temporary files. + +### Isolating host and mount point issues + +Use the specialized **By Host** and **By Mount** tabs to differentiate between database growth and underlying operating system constraints. +- **Detect hardware-specific bottlenecks:** Use the **By Host** view to isolate individual nodes. If a single host is nearing capacity while others are empty, investigate the physical health of that node's drives or specific local file ingestion. +- **Compare identical storage paths:** Use the **By Mount** view to compare usage across all `/data` directories in the cluster. This allows you to verify if the database is consuming space symmetrically across your storage tier. + +### Forecasting and growth analysis + +Use the **Historical** tab to move from reactive monitoring to proactive capacity planning. +- **Analyze consumption trends:** Review the **Historical Storage Trends** graphs to determine your daily or weekly ingest rate. Use this trend to predict exactly when your current storage volume will reach a critical state. +- **Identify seasonal growth patterns:** Look for analytical spikes in the growth patterns view. Correlating these spikes with specific batch jobs or ETL cycles helps you schedule maintenance or expansion before peak load periods. + +### Responding to storage pressure recommendations + +Follow these protocols when storage indicators reach the warning (> 70%) or critical (> 90%) thresholds to prevent database write failures. +- **Analyze table and index volume:** If storage is running low, navigate to the **Data Analysis** panel to identify the largest tables and indexes. Focus on high-growth objects that are the primary drivers of disk consumption. See [Analyzing data distribution](data-analysis) for details. +- **Monitor for database bloat:** Check for high levels of dead tuples. Reclaiming space from bloated tables through vacuuming can often postpone the need for physical hardware expansion. +- **Initiate data archiving:** Move historical or low-access data to cold storage or external archives. This reduces the primary disk footprint while keeping the data available for long-term compliance or infrequent queries. +- **Execute capacity planning:** If growth is consistent with expected usage and cannot be mitigated by archiving or vacuuming, consult your administrator. Use the historical growth trends to justify adding additional disk space or scaling out with more cluster nodes. + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/reference/commands.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/reference/commands.mdx new file mode 100644 index 0000000000..0fd97adec6 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/reference/commands.mdx @@ -0,0 +1,106 @@ +--- +title: Commands +navTitle: WEM command reference +description: Command reference for WarehousePG Enterprise Manager. +--- + +WarehousePG Enterprise Manager (WEM) includes a powerful CLI tool for initialization, diagnostics, and service management. + +## wem setup + +The `wem setup` command is the primary utility for initializing and verifying the WarehousePG (WHPG) monitoring environment. It orchestrates the creation of the monitoring database, installs authentication schemas, and deploys the necessary views across the cluster to enable WEM functionality. + +### Usage + +```bash +wem setup [options] [global-options] +``` + +### Options + +- `--check`: Performs a silent check. Exits with code 0 if setup is complete, and a non-zero code otherwise. +- `--debug`: Enables verbose output, displaying all SQL queries executed during the setup process. +- `--non-interactive`: Runs the setup without user prompts, relying on default values or environment variables. +- `--update-views`: Refreshes monitoring views across the cluster without re-running the full setup. +- `--verify`: Validates the current installation against the expected schema without modifying any data. +- `--wem-admin-password-file`: Path to the file containing the admin password (or where an auto-generated one will be saved). + +### Global options + +| Flag | Env Variable | Default | Description | +| ---- | ------ | ------- | ----- | +| `--host` | `WHPG_HOST` | localhost | The hostname of the WHPG coordinator. | +| `--port` | `WHPG_PORT` | 5432 | The port for the WHPG coordinator. | +| `--user` | `WHPG_USER` | gpadmin | The superuser for the cluster. | +| `--password` | `WHPG_PASSWORD` | - | The password for the WHPG user. | +| `--database` | `WHPG_DATABASE` | wem | The target database for monitoring. | +| `--sslmode` | `PGSSLMODE` | prefer | SSL mode for cluster connection. | + +### Examples + +- Start a guided installation with prompts: + + ```bash + wem setup + ``` + +- Check if an installation is complete without making changes: + + ```bash + wem setup --check + ``` + +- Point setup to a specific host and port: + + ```bash + wem setup --host whpg-coord-01 --port 5433 + ``` + +## wem doctor + +The `wem doctor` command runs a comprehensive suite of checks against your environment. It is designed to identify misconfigurations in database connectivity, web settings, and the observability stack (Prometheus, Loki, and Alloy) before or after the service is started. + +### Usage + +```bash +wem doctor [options] [global options] +``` + +### Options + +- `--prometheus-url PROMETHEUS_URL`: The URL of the Prometheus server for metrics. +- `--loki-url LOKI_URL`: The URL of the Loki server for log aggregation. +- `--alloy-url ALLOY_URL`: The URL of the Alloy collector. +- `-h, --help`: Display help text for the doctor command. + + +### Global options + +| Flag | Default | Description | +| ---- | ------- | ----- | +| `--host` | localhost | The hostname of the WHPG coordinator. | +| `--port` | 5432 | The port for the WHPG coordinator. | +| `--user` | gpadmin | WHPG database superuser. | +| `--database` | wem | The name of the WHPG database to check. | +| `--sslmode` | prefer | SSL mode for cluster connection. | + + +### Examples + +- Check the core database and web configurations: + + ```bash + wem doctor + ``` + +- Validate connection to the Prometheus metrics server: + + ```bash + wem doctor --prometheus-url http://prometheus.internal:9090 + ``` + +- Check every component of the WEM environment in one command: + + ```bash + wem doctor --prometheus-url http://prom:9090 --loki-url http://loki:3100 --alloy-url http://alloy:12345 + ``` \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/reference/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/reference/index.mdx new file mode 100644 index 0000000000..8960ff2b0e --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/reference/index.mdx @@ -0,0 +1,56 @@ +--- +title: Reference +navTitle: Reference +description: Comprehensive technical reference for WarehousePG Enterprise Manager. +--- + +This section provides the technical specifications, definitions, and permission structures governing WarehousePG Enterprise Manager (WEM). + +## WEM command reference + +WEM includes a powerful CLI tool for initialization, diagnostics, and service management. + +- [wem setup:](commands#wem-setup) Initialize the monitoring database, install schemas, and set the admin password. +- [wem doctor:](commands#wem-doctor) Run pre-flight checks to validate database connectivity and observability stack reachability. + +## Monitoring views + +WEM installs [specialized views](views) across your cluster to provide deep visibility into distributed operations. + +## Role permissions matrix + +WEM utilizes Role-Based Access Control (RBAC). Use this matrix to understand the default capabilities of each user role. + +| Permission | Admin | Operator | Viewer | +|------------|-------|----------|--------| +| View dashboard | Yes | Yes | Yes | +| View cluster status | Yes | Yes | Yes | +| View active queries | Yes | Yes | Yes | +| Cancel queries | Yes | Yes | No | +| Terminate sessions | Yes | Yes | No | +| Execute DDL (`CREATE`/`DROP`) | Yes | No | No | +| Execute DML (`INSERT`/`UPDATE`/`DELETE`) | Yes | Limited | No | +| Execute `SELECT` | Yes | Yes | Yes | +| View data analysis | Yes | Yes | No | +| Run `ANALYZE` | Yes | No | No | +| View storage | Yes | Yes | Yes | +| View metrics | Yes | Yes | Yes | +| View logs | Yes | Yes | Yes | +| View backups | Yes | Yes | No | +| Manage users | Yes | No | No | +| Configure permissions | Yes | No | No | +| View audit log | Yes | No | No | + +!!! Note + This matrix represents the default system settings. Administrators can customize these permissions for each role via the **Users** panel under the **Permissions** tab. + +## Keyboard shortcuts + +Speed up your workflow within the WEM interface using the following shortcuts: + +| Shortcut | Action | Context | +|----------|--------|---------| +| `Ctrl+Enter` | Execute query | Query Editor | +| `Ctrl+K` | Open AI Assistant | Query Editor | +| `Ctrl+Space` | Auto-complete | Query Editor | +| `Escape` | Close modal/dialog | Anywhere | \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/reference/views.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/reference/views.mdx new file mode 100644 index 0000000000..03e529a736 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/reference/views.mdx @@ -0,0 +1,28 @@ +--- +title: Views +navTitle: WEM views +description: A complete list of views installed with WarehousePG Enterprise Manager, including their descriptions and use cases. +--- + +WEM installs specialized views across your cluster to provide deep visibility into distributed operations. + +## Monitoring views + +### Core views (available in all databases) + +| View Name | Description | +|-----------|-------------| +| `v_activity` | Enhanced session monitoring. This view extends the standard `pg_stat_activity` by adding WarehousePG-specific columns to help identify distributed query bottlenecks. | +| `v_table_sizes_summary` | Comprehensive storage analysis. Displays table sizes while accounting for different storage formats (Heap/Append-Only) and compression ratios. | +| `v_check_data_skew` | Distribution health check. Analyzes how data is spread across segments to identify "hot" segments that might be slowing down parallel processing. | +| `v_bloat_tables` | Maintenance indicator. Detects table bloat (fragmented space) caused by high volumes of `UPDATE` or `DELETE` operations. | +| `v_locks` | Concurrency monitoring. Provides a clear view of current database locks and identifies exactly which queries are blocking others. | +| `v_size_files` | Low-level storage view. Provides file-level size information directly from the underlying file system of the cluster. | +| `v_indexes_size` | Index audit tool. Lists all index definitions along with their current storage footprint to help identify redundant or oversized indexes. | + +### Dashboard-specific views (WarehousePG database only) + +| View Name | Description | +|-----------|-------------| +| `v_database_sizes` | Historical storage tracking. Provides total database sizes alongside trend data to help with capacity planning. | +| `v_query_stats` | Performance analytics. Aggregates query performance statistics to identify the most resource-intensive or frequently executed queries. | diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/release_notes/0.5_rel_notes.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/release_notes/0.5_rel_notes.mdx new file mode 100644 index 0000000000..5103697e6f --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/release_notes/0.5_rel_notes.mdx @@ -0,0 +1,31 @@ +--- +title: WarehousePG Enterprise Manager 0.5 release notes +navTitle: Version 0.5 (Technical preview) +description: Release notes for version 0.5 of WarehousePG Enterprise Manager. +--- + +**Release Date:** March 2, 2026 + +**Status:** Technical preview + +We are excited to announce the first release of WarehousePG Enterprise Manager (WEM). WEM is a unified management and monitoring platform designed to provide deep visibility into WarehousePG clusters, simplify administrative tasks, and streamline host-based authentication. + +### Key highlights + +- **Deep observability** + - **Real-time & historical metrics:** Monitor your cluster’s pulse with a dual-layer approach. View live SQL statistics directly from the integrated Exporter or analyze long-term trends using historical data stored in Prometheus. + - **Cluster health overview:** Gain an immediate, high-level status of the coordinator and all segment nodes, ensuring high availability and identifying bottlenecks at a glance. + - **Centralized log analysis:** Access and search high-volume log streams aggregated via Loki, reducing the time spent SSH-ing into individual nodes for troubleshooting. + +- **Intelligence & development** + - **Query monitor & AI assistant:** Identify slow-running queries in real-time. Use the integrated AI Assistant within the **SQL Editor** to help write, optimize, and explain complex distributed queries. + - **Table & data analysis:** Explore your schemas, analyze table distribution across segments, and perform deep-dives into data skew and storage efficiency. + +- **Administrative control** + - **Resource management:** Monitor and tune the resource consumption of your WarehousePG cluster to ensure peak performance under heavy workloads. + - **User & access aanagement:** Simplify security with a centralized user interface for managing database users, roles, and permissions. Use the interactive **HBA Editor** to update `pg_hba.conf` with automatic backups and safe reloads. + - **System settings audit:** Search and verify all cluster-level configuration parameters (GUCs) through a dedicated, searchable interface. + +- **Resilience & reliability** + - **Backups & data protection:** Manage and monitor your cluster backup schedules and recovery points from a single location. + - **Alerting & Canary checks:** Stay ahead of failures with configurable alerts and automated Canary checks that proactively test cluster connectivity and performance. diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/release_notes/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/release_notes/index.mdx new file mode 100644 index 0000000000..a1121bac8a --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/release_notes/index.mdx @@ -0,0 +1,13 @@ +--- +title: WarehousePG Enterprise Manager release notes +navTitle: Release notes +description: Release notes provide information on what is new in each release of WarehousePG Enterprise Manager. +navigation: + - 0.5_rel_notes +--- + +The EDB WarehousePG Enterprise Manager documentation describes the latest version of EDB WarehousePG Enterprise Manager, including minor releases and patches. The release notes provide information on what was new in each release. For new functionality introduced in a minor or patch release, the content also indicates the release that introduced the feature. + +| Version | Release date | +|-----------------------------|--------------| +| [0.5](0.5_rel_notes) (Technical preview) | 2 Mar 2026 | diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/system-access/access-management.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/system-access/access-management.mdx new file mode 100644 index 0000000000..37d9fa3d04 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/system-access/access-management.mdx @@ -0,0 +1,52 @@ +--- +title: Defining access policies +navTitle: Defining access policies +description: Use the Access Management panel to manage client authentication rules, database roles, and global system configurations in WarehousePG. +deepToC: true +--- + +The **Access Management** panel on the left sidebar provides an administrative suite for governing how users connect to the cluster and what global privileges they hold. This interface centralizes the security configurations typically managed via command-line configuration files. + +!!! Important + Access to this panel is restricted to users with the **Admin** role privilege. + + +### Auditing authentication firewall + +Use the **pg_hba.conf** tab to monitor the active rules that determine who can connect to your database and from where. + +- **Identify security vulnerabilities:** Review the **Trust Rules** count in the header. A high number indicates rules that allow passwordless access. WarehousePG Enterprise Manager (WEM) will display a warning if this count is excessive, signaling a need to transition those rules to `scram-sha-256` or `md5` authentication. +- **Verify connection pathways:** Audit the `pg_hba.conf Entries` table to ensure that only authorized CIDR address ranges are permitted. Look for explicit reject rules that you have implemented to block known unauthorized subnets. +- **Reload configurations:** If you have made changes to the configuration files, use the **Reload Config** button in the header. This sends a `SIGHUP` signal to the database engine, applying the rules immediately without interrupting active user sessions. + + +### Auditing cluster identities and privileges + +Use the **Roles** tab to monitor the security posture of your user landscape and enforce the "principle of least privilege". + +- **Minimize superuser counts:** Check the **Superusers** metric in the header. This count must be kept to an absolute minimum. If it increases unexpectedly, audit the database roles table to identify which accounts were granted unrestricted access. +- **Manage login capabilities:** Compare the **Total** roles to **Login** roles. Roles without login privileges are typically group roles used for permission inheritance. Ensure that individual human users are the only ones with active login attributes. +- **Review global attributes:** Inspect the **Attributes** column in the **Database Roles** table to verify who can perform sensitive actions like `CREATEDB` (creating databases) or `CREATEROLE` (modifying other users). Monitor the connection limit to prevent any single role from exhausting the cluster's session pool. + + +### Auditing system settings + +Use the **System Settings** tab to audit the current operational thresholds and performance tunings of your cluster. + +- **Search for performance thresholds:** Use the searchable interface to find specific parameters. Review categories like **Resource Usage** or **Memory** to verify that your tunings match the current workload requirements. +- **Identify current values and units:** Check the **Value** and **Unit** columns to ensure that settings like `statement_timeout` are configured correctly to prevent runaway queries from impacting the system. + +!!! Note + This interface is read-only for auditing purposes. To modify a setting, use the `gpconfig` utility. For example: `gpconfig -c statement_timeout -v 10000`. After making changes, apply them by reloading the configuration: `gpstop -u`. + +### Modifying authentication rules + +Use the **HBA Editor** tab to update your connection rules without leaving the management console. +- **Add or reorder rules:** Use the interactive interface to define new type, database, user, and method combinations. Remember that `pg_hba.conf` is parsed sequentially; ensure your more specific rules are placed above more general ones. +- **Commit changes safely:** After making edits, use the **Save Changes** button to store them in the WEM interface. To make them live, select **Reload Config**—this will back up the existing file on the coordinator before overwriting it with your new configuration. +- **Revert edits:** If you make a mistake before committing, use the **Reload File** button to discard your current edits and pull the live version of the file back into the editor. + + + + + diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/system-access/index.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/system-access/index.mdx new file mode 100644 index 0000000000..a8abbf791e --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/system-access/index.mdx @@ -0,0 +1,15 @@ +--- +title: Managing system access +navTitle: Managing system access +description: Use WarehousePG Enterprise Manager to establish safety protocols, manage user credentials, and enforce role-based access controls. +navigation: +- management +- access-management +--- + +Establish safety protocols and manage access to sensitive data to protect the integrity of the cluster. + + +- [Provisioning user accounts:](management) The **Management** panel allows you to create database roles and manage credentials through a centralized interface. + +- [Defining access policies:](access-management) Use the **Access Management** panel to enforce role-based permissions and security controls at the schema and table level. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/system-access/management.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/system-access/management.mdx new file mode 100644 index 0000000000..5a32479dd6 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/system-access/management.mdx @@ -0,0 +1,58 @@ +--- +title: Provisioning user accounts +navTitle: Provisioning user accounts +description: Use the Management panel to manage user accounts, define role-based access controls, and configure system-wide settings for WarehousePG Enterprise Manager. +deepToC: true +--- + +The **Management** panel on the left sidebar is the central administrative hub for WarehousePG Enterprise Manager (WEM). Use these actions to oversee account security, define permission tiers, and configure system-wide integration settings. + +!!! Important + Access to this panel is restricted to users with the **Admin** role privilege. + +### Managing the user lifecycle + +Use the **Users** tab to monitor account status and provision new access credentials. +- **Monitor account security:** Check the header metrics for locked accounts. A non-zero value indicates that users have exceeded the failed login threshold, requiring you to investigate potential security incidents or assist users with password resets. +- **Provision new users:** Select the **Add User** button to create a new identity. Use the **Map to Existing PG User** feature to either link the account to an existing database role (like `gpadmin`) or automatically provision a new database identity that matches the WEM username. +- **Enforce security policies:** When editing a user, use the **Active** toggle bar to revoke access immediately without deleting the account's history. +- **Audit administrative overhead:** Regularly review the **Admin Users** count in the header. Keeping the number of high-privileged accounts to a minimum is a core security best practice. + +### Defining role-based Access + +Use the **Roles** and **Permissions** tabs to control what your team can see and do within the platform. + +- **Leverage system roles:** Assign users to one of the three built-in tiers: + - **Admin:** Full system control. + - **Operator:** Operational dashboard access, including query management and cancellation. + - **Viewer:** Read-only access to metrics and logs. + Refer to the [Role permissions matrix](../reference/#role-permissions-matrix) for details. + +- **Customize module visibility:** Use the **Permissions** tab to toggle the visibility of specific tabs for each role. This allows you to simplify the interface for viewers or restrict sensitive configuration pages to admins only. +- **Restore factory defaults:** If permissions become misconfigured, use the reset defaults button to instantly revert the RBAC matrix to the factory-recommended secure state for all roles. + +### Auditing and security forensics + +Use the **Audit Log** tab to maintain a chronological record of every administrative action performed in the system. + +- **Investigate authentication patterns:** Filter by **Failed Login** to identify potential brute-force attempts. Use the **Login** and **Logout** events to verify user activity during specific incident windows. +- **Track configuration changes:** Review the **Update User** and **Create User** actions to see who modified roles or security flags. This provides accountability for all changes made to your access control layer. + +### Configuring system settings + +Use the **Settings** tab to perform configuration changes to your existing WEM installation. +- **Establish the database backbone:** Configure the WHPG database connection with your coordinator host and credentials. + +!!! Note + Changes to database connection settings or the application port require a restart of the WEM service to take effect. +!!! + +- **Integrate observability tools:** Input your Prometheus and Loki URLs to enable real-time metric graphs and integrated log streaming. If these fields are left empty, the corresponding tabs in the dashboard will remain disabled. +- **Tune query telemetry:** Adjust the **Log Min Duration Statement** to define what constitutes a slow query in milliseconds. This setting directly controls which queries are captured for performance analysis. + +### Scheduling proactive health checks + +Use the **Canary Checks** tab to define automated probes that verify the end-to-end integrity of your cluster. +- **Set up connectivity probes:** Create a **Connectivity** check to verify that the database is accepting new session requests. +- **Measure custom sql performance:** Use the **Query** type to run specific SQL statements (e.g., `SELECT count(*) FROM sales;`) at set intervals. Define warning and critical latency thresholds to trigger alerts if performance degrades. +- **Verify infrastructure health:** Configure **Segment Health** and **Replication** checks to monitor for hardware failures or synchronization lag between primary and mirror segments. \ No newline at end of file diff --git a/advocacy_docs/supported-open-source/warehousepg/wem/troubleshooting.mdx b/advocacy_docs/supported-open-source/warehousepg/wem/troubleshooting.mdx new file mode 100644 index 0000000000..68118b3bd2 --- /dev/null +++ b/advocacy_docs/supported-open-source/warehousepg/wem/troubleshooting.mdx @@ -0,0 +1,96 @@ +--- +title: Troubleshooting common issues +navTitle: Troubleshooting +description: Solutions for connectivity, authentication, and permission issues within WarehousePG Enterprise Manager. +--- + +This guide provides solutions for the most common issues encountered when using WarehousePG Enterprise Manager (WEM). If your issue persists after following these steps, please contact your system administrator. + +## Connectivity + +### Issue: Cannot connect to the database + +If WEM is unable to reach the WarehousePG cluster, follow these diagnostic steps: + +1. Ensure the database is active and accepting local connections: + + ```bash + psql -d postgres -c "SELECT version();" + ``` + +2. Verify that the WEM connection strings are correctly set in the environment: + + ```bash + env | grep WHPG + ``` + +3. Use the built-in WEM tool from the WEM host to validate the current configuration: + + ```bash + ./wem setup --verify + ``` + +4. Test the credentials directly via the CLI using the same parameters defined in the WEM **Settings** tab within the **Users** panel. + + ```bash + PGHOST=localhost PGUSER=gpadmin psql -d postgres -c "SELECT current_database();" + ``` + +## Authentication and access + +### Message: "Session expired" + +**Cause:** Your security token has timed out due to a period of inactivity. + +**Solution:** Select **Log In** to return to the authentication screen and re-enter your credentials. + +Note that any unsaved changes in forms or the query editor will be lost upon session expiration. +To prevent this issue, save your configuration changes frequently and avoid long periods of idle time with the browser tab open. + +### Error: "Permission denied" + +**Cause:** Your assigned role does not have the authorization required to perform the requested action. + +**Solution:** + +1. Verify your current role in the sidebar footer. +2. Review the [Role permissions matrix](reference#role-permissions-matrix) to confirm if the action is permitted for your tier. +3. If you require elevated access, contact your WEM Administrator to request a role change. + + +## Query editor restrictions + +### Issue: Query is blocked + +Symptoms: +- "Query blocked" error messages. +- Inability to execute `INSERT`, `UPDATE`, or `DELETE` statements. +- DDL commands (e.g., `CREATE`, `DROP`) are rejected. + +**Cause:** WEM enforces role-based SQL restrictions to prevent accidental data loss or unauthorized schema changes. + +## Observability and metrics + +### Issue: Charts are not displaying + +Some tabs display the error `Prometheus not configured. Set PROMETHEUS_URL to enable metrics charts (e.g., http://localhost:9090)`. + +**Cause:** The connection to the Prometheus metrics server is either not correctly configured or down. + +**Solution:** +1. Verify that Prometheus is running and reachable via the URL defined in the **Settings** tab within the **Users** panel. +2. Check network connectivity and firewall rules between the WEM server and the Prometheus endpoint. + +### Issue: Logs are not loading + +The **Loki Logs** tab within the **Logs** panel reports the error: `Loki integration is not configured`. + +**Cause:** The Loki log aggregation service is unavailable or the URL is incorrect. + +**Solution:** + +1. Ensure the Loki service is active. +2. Verify the Loki URL in the **Settings** tab within the **Users** panel. +3. Check server-side logs for "Connection Refused" errors. + +