Skip to content

Commit 4f59beb

Browse files
committed
Admin/Monitoring: Implement suggestions by CodeRabbit
1 parent 1b99a6c commit 4f59beb

File tree

1 file changed

+50
-24
lines changed

1 file changed

+50
-24
lines changed

docs/admin/monitoring/prometheus-grafana.md

Lines changed: 50 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,33 @@
33

44
## Introduction
55

6-
If you are running CrateDB in a production environment, you have probably wondered what would be the best way to monitor the servers to identify issues before they become problematic and to collect statistics that you can use for capacity planning.
6+
In production, monitor CrateDB proactively to catch issues early and
7+
collect statistics for capacity planning.
78

8-
We recommend pairing two well-known OSS solutions, [Prometheus](https://prometheus.io/) which is a system that collects and stores performance metrics, and [Grafana](https://grafana.com/) which is a system to create dashboards.
9+
Pair two OSS tools: use [Prometheus] to collect and store metrics,
10+
and [Grafana] to build dashboards.
911

1012
For a CrateDB environment, we are interested in:
1113
* CrateDB-specific metrics, such as the number of shards or number of failed queries
1214
* and OS metrics, such as available disk space, memory usage, or CPU usage
1315

1416
For what concerns CrateDB-specific metrics we recommend making these available to Prometheus by using the [Crate JMX HTTP Exporter](https://cratedb.com/docs/crate/reference/en/5.1/admin/monitoring.html#exposing-jmx-via-http) and [Prometheus SQL Exporter](https://github.com/justwatchcom/sql_exporter). For what concerns OS metrics, in Linux environments, we recommend using the [Prometheus Node Exporter](https://prometheus.io/docs/guides/node-exporter/).
1517

16-
Things are a bit different of course if you are using containers, or if you are using the fully-managed cloud-hosted [CrateDB Cloud](https://cratedb.com/products/cratedb-cloud), but let’s see how all this works on an on-premises installation by setting all this up together.
18+
Containerized and [CrateDB Cloud] setups differ. This tutorial targets
19+
standalone and on‑premises installations.
1720

1821
## First we need a CrateDB cluster
1922

2023
First things first, we will need a CrateDB cluster, you may have one already and that is great, but if you do not we can get one up quickly.
2124

2225
You can review the installation documentation at {ref}`install` and {ref}`multi_node_setup`.
2326

24-
In my case, I am using Ubuntu and I did it like this, first I ssh to the first machine and run:
25-
26-
```
27+
On Ubuntu, start on the first node and run:
28+
```shell
2729
nano /etc/default/crate
2830
```
2931

30-
This is a configuration file that will be used by CrateDB, we only need one line to configure memory settings here (this is a required step otherwise we will fail bootstrap checks):
31-
32+
This configuration file sets the JVM heap. Configure it to satisfy bootstrap checks:
3233
```
3334
CRATE_HEAP_SIZE=4G
3435
```
@@ -89,9 +90,10 @@ And this requires both nodes to be available for the cluster to operate in this
8990
Now let’s install CrateDB:
9091

9192
```bash
92-
wget https://cdn.crate.io/downloads/deb/DEB-GPG-KEY-crate
93-
apt-key add DEB-GPG-KEY-crate
94-
add-apt-repository "deb https://cdn.crate.io/downloads/deb/stable/ $(lsb_release -cs) main"
93+
apt update
94+
apt install --yes gpg lsb-release wget
95+
wget -O- https://cdn.crate.io/downloads/deb/DEB-GPG-KEY-crate | gpg --dearmor | tee /usr/share/keyrings/crate.gpg >/dev/null
96+
echo "deb [signed-by=/usr/share/keyrings/crate.gpg] https://cdn.crate.io/downloads/deb/stable/ $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/crate.list
9597
apt update
9698
apt install crate -o Dpkg::Options::="--force-confold"
9799
```
@@ -105,14 +107,15 @@ This is very simple, on each node run the following:
105107

106108
```shell
107109
cd /usr/share/crate/lib
108-
wget https://repo1.maven.org/maven2/io/crate/crate-jmx-exporter/1.0.0/crate-jmx-exporter-1.0.0.jar
110+
wget https://repo1.maven.org/maven2/io/crate/crate-jmx-exporter/1.2.0/crate-jmx-exporter-1.2.0.jar
109111
nano /etc/default/crate
110112
```
111113

112114
then uncomment the `CRATE_JAVA_OPTS` line and change its value to:
113115

114-
```
115-
CRATE_JAVA_OPTS="-javaagent:/usr/share/crate/lib/crate-jmx-exporter-1.0.0.jar=8080"
116+
```shell
117+
# Append to existing options (preserve other flags).
118+
CRATE_JAVA_OPTS="${CRATE_JAVA_OPTS:-} -javaagent:/usr/share/crate/lib/crate-jmx-exporter-1.2.0.jar=8080"
116119
```
117120

118121
and restart the crate daemon:
@@ -223,29 +226,46 @@ WHERE "state" = 'SUCCESS';
223226

224227
You would run this on a machine that is not part of the CrateDB cluster and it can be installed with:
225228

226-
```
229+
```shell
227230
apt install prometheus --no-install-recommends
228231
```
229232

230-
Please note that by default this will right away become available on port 9090 without authentication requirements, you can use `policy-rcd-declarative` to prevent the service from starting immediately after installation and you can define a YAML web config file with `basic_auth_users` and then refer to that file in `/etc/default/prometheus`.
233+
By default, Prometheus binds to :9090 without authentication. Prevent
234+
auto-start during install (e.g., with `policy-rcd-declarative`), then
235+
configure web auth using a YAML file.
231236

232-
For a large deployment where you also use Prometheus to monitor other systems, you may also want to use a CrateDB cluster as the storage for all Prometheus metrics, you can read more about this at [CrateDB Prometheus Adapter](https://github.com/crate/cratedb-prometheus-adapter).
237+
Create `/etc/prometheus/web.yml`:
233238

234-
Now we will configure Prometheus to scrape metrics from the node explorer from the CrateDB machines and also metrics from our Crate JMX HTTP Exporter:
239+
basic_auth_users:
240+
admin: <bcrypt hash>
235241

236-
```
242+
Point Prometheus at it (e.g., `/etc/default/prometheus`):
243+
244+
ARGS="--web.config.file=/etc/prometheus/web.yml --web.enable-lifecycle"
245+
246+
Restart Prometheus after setting ownership and 0640 permissions on `web.yml`.
247+
248+
For a large deployment where you also use Prometheus to monitor other systems,
249+
you may also want to use a CrateDB cluster as the storage for all Prometheus
250+
metrics, you can read more about this at
251+
[CrateDB Prometheus Adapter](https://github.com/crate/cratedb-prometheus-adapter).
252+
253+
Now we will configure Prometheus to scrape metrics from the node explorer from
254+
the CrateDB machines and also metrics from our Crate JMX HTTP Exporter:
255+
```shell
237256
nano /etc/prometheus/prometheus.yml
238257
```
239258

240259
Where it says:
241-
242260
```yaml
243261
- job_name: 'node'
244262
static_configs:
245263
- targets: ['localhost:9100']
246264
```
247265

248-
We replace this with the below configuration, which reflects port 8080 (Crate JMX Exporter), port 9100 (Prometheus Node Exporter), port 9237 (Prometheus SQL Exporter), as well as port 9100 (Prometheus Node Exporter).
266+
Replace it with the following jobs: port 9100 (Node Exporter),
267+
port 8080 (Crate JMX Exporter), and port 9237 (SQL Exporter),
268+
like outlined below.
249269
```yaml
250270
- job_name: 'node'
251271
static_configs:
@@ -272,9 +292,11 @@ apt install grafana
272292
systemctl start grafana-server
273293
```
274294

275-
If you now point your browser to *http://\<Grafana host>:3000* you will be welcomed by the Grafana login screen, the first time you can log in with admin as both the username and password, make sure to change this password right away.
295+
Open `http://<grafana-host>:3000` to access the Grafana login screen.
296+
The default credentials are `admin`/`admin`; change the password immediately.
276297

277-
Click on "Add your first data source", then click on "Prometheus", and enter the URL *http://\<Prometheus host>:9090*.
298+
Click on "Add your first data source", then click "Prometheus" and set the
299+
URL to `http://<prometheus-host>:9090`.
278300

279301
If you had configured basic authentication for Prometheus this is where you would need to enter the credentials.
280302

@@ -298,7 +320,6 @@ If you decide to build your own dashboard or use an entirely different monitorin
298320
* Circuit breaker memory in use: `sum(crate_circuitbreakers{property="used"}) by (name)`
299321
* Number of shards: `crate_node{name="shard_stats",property="total"}`
300322
* Garbage Collector rates: `sum(rate(jvm_gc_collection_seconds_count[5m])) by (gc)`
301-
* Thread pool queue size: `crate_threadpools{property="queueSize"}`
302323
* Thread pool rejected operations: `crate_threadpools{property="rejected"}`
303324
* Operating system metrics
304325
* CPU utilization
@@ -311,3 +332,8 @@ If you decide to build your own dashboard or use an entirely different monitorin
311332
## Wrapping up
312333

313334
We got a Grafana dashboard that allows us to check live and historical data around performance and capacity metrics in our CrateDB cluster, this illustrates one possible setup. You could use different tools depending on your environment and preferences. Still, we recommend you use the interface of the Crate JMX HTTP Exporter to collect CrateDB-specific metrics and that you always also monitor the health of the environment at the OS level as we have done here with the Prometheus Node Exporter.
335+
336+
337+
[CrateDB Cloud]: https://cratedb.com/products/cratedb-cloud
338+
[Grafana]: https://grafana.com/
339+
[Prometheus]: https://prometheus.io/

0 commit comments

Comments
 (0)