|
| 1 | +(monitoring-prometheus-grafana)= |
| 2 | +# Monitoring a CrateDB cluster with Prometheus and Grafana |
| 3 | + |
| 4 | +:::{div} sd-text-muted |
| 5 | +::: |
| 6 | + |
| 7 | +:::{rubric} Introduction |
| 8 | +::: |
| 9 | + |
| 10 | +We recommend [^standalone] pairing two standard observability tools: |
| 11 | +Use [Prometheus] to collect and store metrics, |
| 12 | +and [Grafana] to build dashboards. |
| 13 | + |
| 14 | +This guide describes how to set up a Grafana dashboard that allows you |
| 15 | +to check live and historical data around performance and capacity |
| 16 | +metrics in your CrateDB cluster. It uses instructions suitable for |
| 17 | +Debian or Ubuntu Linux, but can be adapted for other Linux distributions. |
| 18 | + |
| 19 | +[^standalone]: {ref}`Containerized <install-container>` and [CrateDB Cloud] setups differ. |
| 20 | + This tutorial targets standalone and on‑premises installations. |
| 21 | + |
| 22 | +:::{rubric} Overview |
| 23 | +::: |
| 24 | + |
| 25 | +For a CrateDB environment, you are interested in CrateDB-specific metrics, |
| 26 | +such as the number of shards or number of failed queries, and OS metrics, |
| 27 | +such as available disk space, memory usage, or CPU usage. |
| 28 | +Based on Prometheus, the monitoring stack uses the following exporters |
| 29 | +to fulfill those requirements. |
| 30 | + |
| 31 | +:Node Exporter: |
| 32 | + |
| 33 | + Exposes a wide variety of hardware and kernel-related metrics. |
| 34 | + |
| 35 | +:JMX Exporter: |
| 36 | + |
| 37 | + Consumes metrics information from CrateDB's |
| 38 | + JMX collectors and exposes them via HTTP so they can be scraped by Prometheus. |
| 39 | + |
| 40 | +:SQL Exporter: |
| 41 | + |
| 42 | + Allows running arbitrary SQL |
| 43 | + statements against a CrateDB cluster to retrieve additional |
| 44 | + information from CrateDB's system tables. |
| 45 | + |
| 46 | +## Set up CrateDB cluster |
| 47 | + |
| 48 | +First things first, you will need a CrateDB cluster. |
| 49 | +{ref}`Multi-node setup instructions <multi-node-setup-example>` provides |
| 50 | +a quick walkthrough for Ubuntu Linux. |
| 51 | + |
| 52 | +## Set up Prometheus Exporters |
| 53 | + |
| 54 | +The Node Exporter and the JMX Exporter need to be installed on all |
| 55 | +machines that are running CrateDB nodes. |
| 56 | + |
| 57 | +1. Install the Prometheus Node Exporter. |
| 58 | + ```shell |
| 59 | + apt install prometheus-node-exporter |
| 60 | + ``` |
| 61 | + |
| 62 | +2. Install the {ref}`prometheus-jmx-exporter`. |
| 63 | + |
| 64 | +## Set up Prometheus |
| 65 | + |
| 66 | +You would typically run this on a machine that is not part of the |
| 67 | +CrateDB cluster. |
| 68 | +The {ref}`prometheus-sql-exporter` also does not need to be installed |
| 69 | +on each machine. |
| 70 | + |
| 71 | +```shell |
| 72 | +apt install prometheus prometheus-sql-exporter --no-install-recommends |
| 73 | +``` |
| 74 | + |
| 75 | +For advanced configuration options, see {ref}`prometheus-auth` and |
| 76 | +{ref}`prometheus-storage`. |
| 77 | + |
| 78 | +Now, configure Prometheus to scrape metrics from Node Exporters and |
| 79 | +JMX Exporters on all CrateDB nodes, and also metrics from the SQL |
| 80 | +Exporter. |
| 81 | +```shell |
| 82 | +nano /etc/prometheus/prometheus.yml |
| 83 | +``` |
| 84 | + |
| 85 | +:Node Exporter: Port 9100 |
| 86 | +:JMX Exporter: Port 8080 |
| 87 | +:SQL Exporter: Port 9237 |
| 88 | + |
| 89 | +```yaml |
| 90 | +- job_name: 'node' |
| 91 | + static_configs: |
| 92 | + - targets: ['ubuntuvm1:9100', 'ubuntuvm2:9100'] |
| 93 | + |
| 94 | +- job_name: 'cratedb_jmx' |
| 95 | + static_configs: |
| 96 | + - targets: ['ubuntuvm1:8080', 'ubuntuvm2:8080'] |
| 97 | + |
| 98 | +- job_name: 'sql_exporter' |
| 99 | + static_configs: |
| 100 | + - targets: ['localhost:9237'] |
| 101 | +``` |
| 102 | +
|
| 103 | +Restart the Prometheus daemon if it was already started. |
| 104 | +```shell |
| 105 | +systemctl restart prometheus |
| 106 | +``` |
| 107 | + |
| 108 | +## Set up Grafana |
| 109 | + |
| 110 | +Install Grafana on the same machine where you installed Prometheus. |
| 111 | +On a Debian or Ubuntu machine, run the following: |
| 112 | +```shell |
| 113 | +apt install --yes wget gpg |
| 114 | +wget -q -O - https://packages.grafana.com/gpg.key | gpg --dearmor | tee /usr/share/keyrings/grafana.gpg >/dev/null |
| 115 | +echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://packages.grafana.com/oss/deb stable main" | tee /etc/apt/sources.list.d/grafana.list |
| 116 | +apt update |
| 117 | +apt install --yes grafana |
| 118 | +``` |
| 119 | +Then, start Grafana. |
| 120 | +```shell |
| 121 | +systemctl start grafana-server |
| 122 | +``` |
| 123 | +For other systems, see the [Grafana installation documentation][grafana-debian]. |
| 124 | + |
| 125 | +:::{rubric} Data source |
| 126 | +::: |
| 127 | + |
| 128 | +Navigate to `http://<grafana-host>:3000/` to access the Grafana login screen. |
| 129 | +The default credentials are `admin`/`admin`; change the password immediately. |
| 130 | +Navigate to "Add your first data source", then select "Prometheus" and set the |
| 131 | +URL to `http://<prometheus-host>:9090/`. |
| 132 | +If you configured basic authentication for Prometheus, this is where you |
| 133 | +would need to enter the credentials. |
| 134 | +Confirm using "Save & test". |
| 135 | + |
| 136 | +:::{rubric} Dashboard |
| 137 | +::: |
| 138 | + |
| 139 | +An example dashboard based on the discussed setup is available for easy importing |
| 140 | +from [Grafana » CrateDB Monitoring Dashboard]. |
| 141 | +In your Grafana installation, on the left-hand side, hover over the “Dashboards” |
| 142 | +icon and select “Import”. Specify the dashboard ID **17174** and load the dashboard. |
| 143 | +On the next screen, finalize the setup by selecting the previously created |
| 144 | +Prometheus data source. |
| 145 | + |
| 146 | +{width=690px} |
| 147 | + |
| 148 | +## Alternative implementations |
| 149 | + |
| 150 | +Build your own dashboard or use an entirely different monitoring approach while |
| 151 | +still covering similar metrics discussed in this article. |
| 152 | +The list below is a good starting point for troubleshooting most operational issues. |
| 153 | + |
| 154 | +* CrateDB metrics (with example Prometheus queries based on the Crate JMX HTTP Exporter) |
| 155 | + * Thread pools rejected: `sum(rate(crate_threadpools{property="rejected"}[5m])) by (name)` |
| 156 | + * Thread pool queue size: `sum(crate_threadpools{property="queueSize"}) by (name)` |
| 157 | + * Thread pools active: `sum(crate_threadpools{property="active"}) by (name)` |
| 158 | + * Queries per second: `sum(rate(crate_query_total_count[5m])) by (query)` |
| 159 | + * Query error rate: `sum(rate(crate_query_failed_count[5m])) by (query)` |
| 160 | + * Average Query Duration over the last 5 minutes: `sum(rate(crate_query_sum_of_durations_millis[5m])) by (query) / sum(rate(crate_query_total_count[5m])) by (query)` |
| 161 | + * Circuit breaker memory in use: `sum(crate_circuitbreakers{property="used"}) by (name)` |
| 162 | + * Number of shards: `crate_node{name="shard_stats",property="total"}` |
| 163 | + * Garbage Collector rates: `sum(rate(jvm_gc_collection_seconds_count[5m])) by (gc)` |
| 164 | + * Thread pool rejected operations: `crate_threadpools{property="rejected"}` |
| 165 | +* Operating system metrics |
| 166 | + * CPU utilization |
| 167 | + * Memory usage |
| 168 | + * Open file descriptors |
| 169 | + * Disk usage |
| 170 | + * Disk read/write operations and throughput |
| 171 | + * Received and transmitted network traffic |
| 172 | + |
| 173 | +## Appendix |
| 174 | + |
| 175 | +(prometheus-auth)= |
| 176 | +:::{rubric} Prometheus authentication |
| 177 | +::: |
| 178 | + |
| 179 | +By default, Prometheus binds to port 9090 without authentication. Prevent |
| 180 | +auto-start during install (e.g., with `policy-rcd-declarative`), then |
| 181 | +configure web auth using a YAML file. |
| 182 | + |
| 183 | +Create `/etc/prometheus/web.yml`: |
| 184 | +```yaml |
| 185 | +basic_auth_users: |
| 186 | + admin: <bcrypt hash> |
| 187 | +``` |
| 188 | +
|
| 189 | +Point Prometheus at it (e.g., `/etc/default/prometheus`): |
| 190 | + |
| 191 | +```shell |
| 192 | +ARGS="--web.config.file=/etc/prometheus/web.yml --web.enable-lifecycle" |
| 193 | +``` |
| 194 | + |
| 195 | +Restart Prometheus after setting ownership and 0640 permissions on `web.yml`. |
| 196 | + |
| 197 | +(prometheus-storage)= |
| 198 | +:::{rubric} CrateDB as Prometheus storage |
| 199 | +::: |
| 200 | + |
| 201 | +For a large deployment where you also use Prometheus to monitor other systems, |
| 202 | +you may also want to use a CrateDB cluster as the storage for all Prometheus |
| 203 | +metrics. The {ref}`CrateDB Prometheus Adapter <prometheus>` achieves that. |
| 204 | + |
| 205 | + |
| 206 | +[CrateDB Cloud]: https://cratedb.com/products/cratedb-cloud |
| 207 | +[Grafana]: https://grafana.com/ |
| 208 | +[grafana-debian]: https://grafana.com/docs/grafana/latest/setup-grafana/installation/debian/ |
| 209 | +[Grafana » CrateDB Monitoring Dashboard]: https://grafana.com/grafana/dashboards/17174-cratedb-monitoring/ |
| 210 | +[Prometheus]: https://prometheus.io/ |
| 211 | +[Prometheus Node Exporter]: https://prometheus.io/docs/guides/node-exporter/ |
0 commit comments