From 520c90b3d362dbc155dc90a9e7985cf7a3a34189 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 30 Oct 2025 10:48:03 +0100 Subject: [PATCH] Integrate: Improve guidance for data import / ingest / load --- docs/howto/index.md | 2 +- docs/integrate/dask/usage.md | 4 ++-- docs/integrate/kafka/docker-python.md | 10 +++++----- docs/integrate/pandas/efficient-ingest.md | 4 ++-- docs/integrate/pandas/index.md | 4 ++-- 5 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/howto/index.md b/docs/howto/index.md index 44c58f19..1a1ac12e 100644 --- a/docs/howto/index.md +++ b/docs/howto/index.md @@ -41,7 +41,7 @@ Instructions how to get tasks done with CrateDB. - {ref}`Using JMeter with CrateDB ` - {ref}`langchain-usage` - {ref}`metabase-usage` -- {ref}`pandas-efficient-ingest` +- {ref}`pandas-bulk-import` - {ref}`PyCaret and CrateDB ` - {ref}`rill-usage` - {ref}`marquez-usage` diff --git a/docs/integrate/dask/usage.md b/docs/integrate/dask/usage.md index a30a1037..8340e712 100644 --- a/docs/integrate/dask/usage.md +++ b/docs/integrate/dask/usage.md @@ -1,6 +1,6 @@ (dask-usage)= -(dask-efficient-ingest)= -# Efficient data ingestion with Dask and CrateDB +(dask-bulk-import)= +# Efficient bulk imports with Dask ## Introduction Dask is a parallel computing library that enables distributed computing for tasks such as data processing and machine learning. diff --git a/docs/integrate/kafka/docker-python.md b/docs/integrate/kafka/docker-python.md index ab0cf124..e9d251c2 100644 --- a/docs/integrate/kafka/docker-python.md +++ b/docs/integrate/kafka/docker-python.md @@ -4,7 +4,7 @@ This walkthrough demonstrates how to load data from a Kafka topic into a CrateDB table, using a Python consumer and CrateDB's HTTP interface. -## Starting services +## Start services Start Kafka and CrateDB using Docker Compose. ```yaml @@ -45,7 +45,7 @@ networks: docker compose up -d ``` -## Provisioning CrateDB and Kafka +## Provision CrateDB and Kafka * CrateDB Admin UI: `http://localhost:4200` * Kafka broker (inside-compose hostname): kafka:9092 @@ -86,9 +86,9 @@ EOF Messages are newline-delimited JSON for simplicity. -## Loading data +## Data loading -### Create a simple consumer using Python +Create a simple consumer using Python. ```python # quick_consumer.py @@ -146,7 +146,7 @@ python quick_consumer.py This shows the custom client path: transform/filter as you like, do idempotent upserts on (device_id, ts), and batch writes for speed. ::: -## Verifying the data +## Explore data Use `curl` to submit a `SELECT` statement that verifies data has been stored in CrateDB. ```bash diff --git a/docs/integrate/pandas/efficient-ingest.md b/docs/integrate/pandas/efficient-ingest.md index ce22afeb..b73fbf12 100644 --- a/docs/integrate/pandas/efficient-ingest.md +++ b/docs/integrate/pandas/efficient-ingest.md @@ -1,5 +1,5 @@ -(pandas-efficient-ingest)= -# Guide to efficient data ingestion to CrateDB with pandas +(pandas-bulk-import)= +# Efficient bulk imports with pandas ## Introduction Bulk insert is a technique for efficiently inserting large amounts of data into a database by submitting multiple rows of data in a single database transaction. Instead of executing multiple SQL `INSERT` statements for each individual row of data, the bulk insert allows the database to process and store a batch of data at once. This approach can significantly improve the performance of data insertion, especially when dealing with large datasets. diff --git a/docs/integrate/pandas/index.md b/docs/integrate/pandas/index.md index e1cb6c1c..b3c28ab5 100644 --- a/docs/integrate/pandas/index.md +++ b/docs/integrate/pandas/index.md @@ -37,8 +37,8 @@ data structures and operations for manipulating numerical tables and time series - {ref}`pandas-tutorial-start` - {ref}`pandas-tutorial-jupyter` - {ref}`arrow-import-parquet` -- {ref}`pandas-efficient-ingest` -- See also: {ref}`dask-efficient-ingest` +- {ref}`pandas-bulk-import` +- See also: {ref}`dask-bulk-import` - See also: [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy] :::{rubric} Code examples