diff --git a/docs/integrate/kafka/docker-python.md b/docs/integrate/kafka/docker-python.md index e9d251c2..d909438c 100644 --- a/docs/integrate/kafka/docker-python.md +++ b/docs/integrate/kafka/docker-python.md @@ -50,7 +50,7 @@ docker compose up -d * CrateDB Admin UI: `http://localhost:4200` * Kafka broker (inside-compose hostname): kafka:9092 -### Create a demo table in CrateDB +### Create a CrateDB table The easiest way to do this is through the CrateDB Admin UI at `http://localhost:4200` and execute this using the console: @@ -69,26 +69,30 @@ But this can also be done using `curl`: curl -sS -H 'Content-Type: application/json' -X POST http://localhost:4200/_sql -d '{"stmt":"CREATE TABLE IF NOT EXISTS sensor_readings (device_id TEXT, ts TIMESTAMPTZ, temperature DOUBLE PRECISION, humidity DOUBLE PRECISION, PRIMARY KEY (device_id, ts))"}' ``` -### Create a Kafka topic and send a couple of messages +### Create a Kafka topic Creating a Kafka topic can be done in several ways, we are selecting to use `docker exec` in this way: - ```bash docker exec -it kafka kafka-topics.sh --create --topic sensors --bootstrap-server kafka:9092 --partitions 3 --replication-factor 1 +``` + +## Process events +### Submit events to Kafka +```bash docker exec -it kafka kafka-console-producer.sh --bootstrap-server kafka:9092 --topic sensors <<'EOF' {"device_id":"alpha","ts":"2025-08-19T12:00:00Z","temperature":21.4,"humidity":48.0} {"device_id":"alpha","ts":"2025-08-19T12:01:00Z","temperature":21.5,"humidity":47.6} {"device_id":"beta","ts":"2025-08-19T12:00:00Z","temperature":19.8,"humidity":55.1} EOF ``` +Events (messages) are newline-delimited JSON for simplicity. -Messages are newline-delimited JSON for simplicity. - -## Data loading +### Consume events into CrateDB -Create a simple consumer using Python. +Create a simple consumer application using Python. It consumes events from the +Kafka topic and inserts them into the CrateDB database table. ```python # quick_consumer.py diff --git a/docs/integrate/node-red/mqtt-tutorial.md b/docs/integrate/node-red/mqtt-tutorial.md index d250ee95..f6134921 100644 --- a/docs/integrate/node-red/mqtt-tutorial.md +++ b/docs/integrate/node-red/mqtt-tutorial.md @@ -1,5 +1,5 @@ (node-red-tutorial)= -# Ingesting MQTT messages into CrateDB using Node-RED +# Load MQTT messages into CrateDB using Node-RED :::{article-info} --- @@ -23,19 +23,7 @@ You need: 2. The [node-red-contrib-postgresql](https://github.com/alexandrainst/node-red-contrib-postgresql) module installed. 3. A running MQTT broker. This tutorial uses [HiveMQ Cloud](https://www.hivemq.com/). -## Producing data - -First, generate data to populate the MQTT topic with Node-RED. If you already -have an MQTT topic with regular messages, you can skip this part. -![Screenshot 2021-09-13 at 14.58.42|690x134, 50%](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/5722946039148ca6ce69702d963f9f842c4f972c.png){width=480px} - -The `inject` node creates a JSON payload with three attributes: -![Screenshot 2021-09-13 at 14.56.42|690x293, 50%](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/8084a53e544d681e79f85d780c621a340a7d0d30.png){width=480px} - -In this example, two fields are static; only the timestamp changes. -Download the full workflow definition: [flows-producer.json](https://community.cratedb.com/uploads/short-url/eOvAk3XzDkRbNZjcZV0pZ0SnGu4.json) (1.3 KB) - -## Consuming and ingesting data +## Provision CrateDB First of all, we create the target table in CrateDB: ```sql @@ -49,6 +37,20 @@ Store the payload as CrateDB’s {ref}`OBJECT data type ` to accommodate an evolving schema. For production, also consider the {ref}`partitioning and sharding guide `. +## Publish messages to MQTT + +First, generate data to populate the MQTT topic with Node-RED. If you already +have an MQTT topic with regular messages, you can skip this part. +![Screenshot 2021-09-13 at 14.58.42|690x134, 50%](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/5722946039148ca6ce69702d963f9f842c4f972c.png){width=480px} + +The `inject` node creates a JSON payload with three attributes: +![Screenshot 2021-09-13 at 14.56.42|690x293, 50%](https://us1.discourse-cdn.com/flex020/uploads/crate/original/1X/8084a53e544d681e79f85d780c621a340a7d0d30.png){width=480px} + +In this example, two fields are static; only the timestamp changes. +Download the full workflow definition: [flows-producer.json](https://community.cratedb.com/uploads/short-url/eOvAk3XzDkRbNZjcZV0pZ0SnGu4.json) (1.3 KB) + +## Consume messages into CrateDB + To ingest efficiently, group messages into batches and use {ref}`multi-value INSERT statements ` to avoid generating one INSERT per message: diff --git a/docs/performance/inserts/tuning.md b/docs/performance/inserts/tuning.md index d24e6d5e..7dcf5df4 100644 --- a/docs/performance/inserts/tuning.md +++ b/docs/performance/inserts/tuning.md @@ -117,7 +117,7 @@ value. ### Calculating statistics -After loading larger amounts of data into new or existing tables, it is +After inserting larger amounts of data into new or existing tables, it is recommended to re-calculate the statistics by executing the `ANALYZE` command. The statistics will be used by the query optimizer to generate better execution plans.