Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 88 additions & 67 deletions clickhouse/README.md
Original file line number Diff line number Diff line change
@@ -1,114 +1,135 @@
# Agent Check: ClickHouse
# ClickHouse Integration

## Overview

This check monitors [ClickHouse][1] through the Datadog Agent.
The ClickHouse integration provides health and performance metrics for your ClickHouse database in near real-time. Visualize these metrics with the provided dashboard and create monitors to alert your team on ClickHouse states.

**Minimum Agent version:** 7.16.0
Enable Database Monitoring (DBM) for enhanced insights into query performance and database health. In addition to the standard integration, Datadog DBM provides query-level metrics, live and historical query snapshots, and query explain plans.

## Setup
**Minimum Agent version:** 7.50.0

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates][2] for guidance on applying these instructions.
## Setup

### Installation

The ClickHouse check is included in the [Datadog Agent][3] package. No additional installation is needed on your server.
The ClickHouse check is packaged with the Agent. To start gathering your ClickHouse metrics and logs, [install the Agent](https://docs.datadoghq.com/agent/).

### Configuration

<!-- xxx tabs xxx -->
<!-- xxx tab "Host" xxx -->
#### Prepare ClickHouse

#### Host
To get started with the ClickHouse integration, create a `datadog` user with proper access to your ClickHouse server.

To configure this check for an Agent running on a host:
```sql
CREATE USER datadog IDENTIFIED BY '<PASSWORD>';
GRANT SELECT ON system.* TO datadog;
GRANT SELECT ON information_schema.* TO datadog;
GRANT SHOW DATABASES ON *.* TO datadog;
GRANT SHOW TABLES ON *.* TO datadog;
GRANT SHOW COLUMNS ON *.* TO datadog;
```

#### Metric collection
#### Configure the Agent

1. To start collecting your ClickHouse performance data, edit the `clickhouse.d/conf.yaml` file in the `conf.d/` folder at the root of your Agent's configuration directory. See the [sample clickhouse.d/conf.yaml][4] for all available configuration options.
Edit the `clickhouse.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your ClickHouse performance data. See the [sample clickhouse.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/clickhouse/datadog_checks/clickhouse/data/conf.yaml.example) for all available configuration options.

*Note*: This integration uses the official `clickhouse-connect` client to connect over HTTP.
```yaml
init_config:

2. [Restart the Agent][5].
instances:
- server: localhost
port: 8123
username: datadog
password: <PASSWORD>

##### Log collection
# Enable Database Monitoring
dbm: true

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:
# Query Metrics Configuration
query_metrics:
enabled: true
collection_interval: 60

```yaml
logs_enabled: true
```
# Query Samples Configuration
query_samples:
enabled: true
collection_interval: 10

2. Add the log files you are interested in to your `clickhouse.d/conf.yaml` file to start collecting your ClickHouse logs:
# Activity snapshot configuration
activity_enabled: true
activity_collection_interval: 10
activity_max_rows: 1000
```

```yaml
logs:
- type: file
path: /var/log/clickhouse-server/clickhouse-server.log
source: clickhouse
service: "<SERVICE_NAME>"
```
#### Enable query_log

Change the `path` and `service` parameter values and configure them for your environment. See the [sample clickhouse.d/conf.yaml][4] for all available configuration options.
For Database Monitoring features, you need to enable ClickHouse's `query_log`. Add this to your ClickHouse server configuration:

3. [Restart the Agent][5].
```xml
<clickhouse>
<query_log>
<database>system</database>
<table>query_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
</clickhouse>
```

<!-- xxz tab xxx -->
<!-- xxx tab "Containerized" xxx -->
[Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent) to start sending ClickHouse metrics to Datadog.

#### Containerized
### Validation

For containerized environments, see the [Autodiscovery Integration Templates][2] for guidance on applying the parameters below.
[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information) and look for `clickhouse` under the Checks section.

#### Metric collection
## Data Collected

| Parameter | Value |
|----------------------|------------------------------------------------------------|
| `<INTEGRATION_NAME>` | `clickhouse` |
| `<INIT_CONFIG>` | blank or `{}` |
| `<INSTANCE_CONFIG>` | `{"server": "%%host%%", "port": "%%port%%", "username": "<USER>", "password": "<PASSWORD>"}` |
### Metrics

##### Log collection
The ClickHouse integration collects a wide range of metrics from ClickHouse system tables. See [metadata.csv](https://github.com/DataDog/integrations-core/blob/master/clickhouse/metadata.csv) for a list of metrics provided by this integration.

Collecting logs is disabled by default in the Datadog Agent. To enable it, see [Kubernetes log collection][6].
### Database Monitoring

| Parameter | Value |
|----------------|-------------------------------------------|
| `<LOG_CONFIG>` | `{"source": "clickhouse", "service": "<SERVICE_NAME>"}` |
When Database Monitoring is enabled, the integration collects:

<!-- xxz tab xxx -->
<!-- xxz tabs xxx -->
- **Query Metrics**: Aggregated query performance metrics from `system.query_log`
- **Query Samples**: Execution plans for currently running queries from `system.processes`
- **Activity Snapshots**: Real-time view of active sessions and connections

### Validation
### Events

[Run the Agent's status subcommand][7] and look for `clickhouse` under the **Checks** section.
The ClickHouse check does not include any events.

## Data Collected
### Service Checks

### Metrics
**clickhouse.can_connect**:
Returns `CRITICAL` if the Agent cannot connect to ClickHouse, otherwise returns `OK`.

See [metadata.csv][8] for a list of metrics provided by this integration.
## Troubleshooting

### Events
### Connection Issues

The ClickHouse check does not include any events.
If you encounter connection errors:

### Service Checks
1. Verify ClickHouse is running and accessible on the configured host and port
2. Use port `8123` (HTTP interface) for the agent connection
3. Ensure the `datadog` user has the required permissions
4. Check firewall rules allow connections from the Agent

See [service_checks.json][9] for a list of service checks provided by this integration.
### Database Monitoring Not Collecting Data

## Troubleshooting
If DBM features are not working:

1. Verify `dbm: true` is set in the configuration
2. Ensure `query_log` is enabled in ClickHouse server configuration
3. Check that the `datadog` user has SELECT permissions on `system.query_log` and `system.processes`
4. Review Agent logs for any errors

For more troubleshooting help, contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading

Need help? Contact [Datadog support][10].
Additional helpful documentation, links, and articles:

- [Monitor ClickHouse with Datadog](https://www.datadoghq.com/blog/monitor-clickhouse/)
- [Database Monitoring](https://docs.datadoghq.com/database_monitoring/)

[1]: https://clickhouse.yandex
[2]: https://docs.datadoghq.com/agent/kubernetes/integrations/
[3]: /account/settings/agent/latest
[4]: https://github.com/DataDog/integrations-core/blob/master/clickhouse/datadog_checks/clickhouse/data/conf.yaml.example
[5]: https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent
[6]: https://docs.datadoghq.com/agent/kubernetes/log/
[7]: https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information
[8]: https://github.com/DataDog/integrations-core/blob/master/clickhouse/metadata.csv
[9]: https://github.com/DataDog/integrations-core/blob/master/clickhouse/assets/service_checks.json
[10]: https://docs.datadoghq.com/help/
148 changes: 146 additions & 2 deletions clickhouse/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ files:
- template: instances
options:
- name: server
required: true
description: The hostname used to connect to the system.
value:
type: string
Expand Down Expand Up @@ -51,7 +50,7 @@ files:
description: |
The compression algorithm to use. The default is no compression.
If br is specified, the brotli library must be installed separately.

Valid values are:
- lz4
- zstd
Expand All @@ -73,6 +72,151 @@ files:
value:
type: boolean
example: True
- name: dbm
description: |
Enable Database Monitoring (DBM) to collect query samples and execution plans.
This feature provides deep observability into query performance.
value:
type: boolean
example: false
- name: query_samples
description: |
Configuration for collecting query samples when Database Monitoring (DBM) is enabled.
Query samples provide insights into the queries being executed on your ClickHouse instance.
options:
- name: enabled
description: Enable collection of query samples.
value:
type: boolean
example: true
- name: collection_interval
description: |
The interval in seconds between query sample collections.
Lower values provide more granular data but increase overhead.
value:
type: number
example: 10
- name: samples_per_hour_per_query
description: |
The maximum number of samples to collect per unique query signature per hour.
This helps limit the volume of data collected while still providing useful insights.
value:
type: number
example: 15
- name: seen_samples_cache_maxsize
description: |
The maximum size of the cache used to track which query samples have been collected.
A larger cache can help avoid collecting duplicate samples.
value:
type: number
example: 10000
- name: run_sync
description: |
Whether to run query sample collection synchronously in the check run.
Set to false (default) to run asynchronously in a separate thread.
value:
type: boolean
example: false
- name: activity_enabled
description: |
Enable collection of database activity snapshots.
Activity snapshots capture currently executing queries and active connections.
value:
type: boolean
example: true
- name: activity_collection_interval
description: |
The interval in seconds between activity snapshot collections.
Lower values capture more activity data but increase overhead.
For fast ClickHouse queries, consider using 1-5 seconds.
value:
type: number
example: 10
- name: activity_max_rows
description: |
The maximum number of active sessions to include in each activity snapshot.
value:
type: number
example: 1000
- name: database_instance_collection_interval
hidden: true
description: |
Set the database instance collection interval (in seconds). The database instance collection sends
basic information about the database instance along with a signal that it still exists.
This collection does not involve any additional queries to the database.
value:
type: number
default: 300
- name: aws
description: |
This block defines the configuration for AWS RDS and Aurora instances.

Complete this section if you have installed the Datadog AWS Integration to enrich instances
with ClickHouse integration telemetry.
options:
- name: instance_endpoint
description: |
Equal to the Endpoint.Address of the instance the agent is connecting to.
This value is optional if the value of `server` is already configured to the instance endpoint.

For more information on instance endpoints,
see the AWS docs https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_Endpoint.html
value:
type: string
example: mydb.cfxgae8cilcf.us-east-1.rds.amazonaws.com
- name: gcp
description: |
This block defines the configuration for Google Cloud SQL instances.

Complete this section if you have installed the Datadog GCP Integration to enrich instances
with ClickHouse integration telemetry.
options:
- name: project_id
description: |
Equal to the GCP resource's project ID.

For more information on project IDs,
see the GCP docs https://cloud.google.com/resource-manager/docs/creating-managing-projects
value:
type: string
example: foo-project
- name: instance_id
description: |
Equal to the GCP resource's instance ID.

For more information on instance IDs,
see the GCP docs https://cloud.google.com/sql/docs/mysql/instance-settings#instance-id-2ndgen
value:
type: string
example: foo-database
- name: azure
description: |
This block defines the configuration for Azure Database for ClickHouse.

Complete this section if you have installed the Datadog Azure Integration to enrich instances
with ClickHouse integration telemetry.
options:
- name: deployment_type
description: |
Equal to the deployment type for the managed database.

For Azure, this is typically 'flexible_server' or 'single_server'.
value:
type: string
example: flexible_server
- name: fully_qualified_domain_name
description: |
Equal to the fully qualified domain name of the Azure database.

This value is optional if the value of `server` is already configured to the fully qualified domain name.
value:
type: string
example: my-clickhouse.database.windows.net
- name: database_name
description: |
The database name for the Azure instance.
value:
type: string
- template: instances/db
overrides:
custom_queries.value.example:
Expand Down
1 change: 1 addition & 0 deletions clickhouse/changelog.d/21773.added
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add Database Monitoring (DBM) support with query sample collection from system.query_log
Loading
Loading