From e1bf05aa38ed366ec588e23b8354dda6a88de519 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 28 Oct 2025 22:39:57 +0100 Subject: [PATCH 01/28] Storage: Improve guidance --- docs/feature/storage/index.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 6282d91c..3b80dcc1 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -9,11 +9,7 @@ The CrateDB storage layer is based on Lucene. ::: -By default, all fields are indexed, -nested or not, but the indexing can be turned off selectively. - -This page enumerates some concepts of Lucene. The article {ref}`indexing-and-storage` -goes into more details by exploring its internal workings. +This page enumerates important concepts and implementations of Lucene used by CrateDB. ## Lucene @@ -52,6 +48,9 @@ Elasticsearch are building upon the same technologies. CrateDB uses three main data structures of Lucene: Inverted indexes for text values, BKD trees for numeric values, and doc values. +By default, all fields are indexed, nested or not, but the indexing can be turned +off selectively. + - **Inverted index** The Lucene indexing strategy for text fields relies on a data structure called inverted @@ -89,15 +88,20 @@ Inverted indexes for text values, BKD trees for numeric values, and doc values. all field values that are not analyzed as strings in a compact column, making it more effective for sorting and aggregations. -## See also - -- {ref}`indexing-and-storage` +::::{todo} +Enable after merging [GH-434: Indexing and storage](https://github.com/crate/cratedb-guide/pull/434). +```md +## Related sections +{ref}`indexing-and-storage` explores the internal workings and data structures +of CrateDB's storage layer in more detail. :::{toctree} :hidden: indexing-and-storage ::: +``` +:::: [column-based store]: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/storage.html From 3b9ccd6475380e645a60c48ff18f10d15e9b6e77 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 28 Oct 2025 22:40:57 +0100 Subject: [PATCH 02/28] Storage: Extend segments with information about merge policy --- docs/feature/storage/index.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 3b80dcc1..5081436d 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -32,6 +32,11 @@ Elasticsearch are building upon the same technologies. some segments and discard the freed ones. This way, adding a new document does not require rebuilding the whole index structure completely. + CrateDB uses Lucene's default TieredMergePolicy. It merges segments of roughly equal size + and controls the number of segments per "tier" to balance search performance with merge + overhead. Lucene's [TieredMergePolicy] documentation explains in detail how CrateDB's + underlying merge policy decides when to combine segments. + - **Column store** For text values, other than storing the row data as-is (and indexing each value by default), @@ -105,3 +110,4 @@ indexing-and-storage [column-based store]: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/storage.html +[TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html From 290f76dbb283d0a66e3bb034199a068a430df023 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 28 Oct 2025 23:47:37 +0100 Subject: [PATCH 03/28] Storage: Reorganize sections --- docs/feature/storage/index.md | 92 +++++++++++++++++++---------------- 1 file changed, 49 insertions(+), 43 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 5081436d..c4ab4da3 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -9,54 +9,28 @@ The CrateDB storage layer is based on Lucene. ::: +Lucene offers scalable and high-performance indexing, which enables efficient search and +aggregations over documents and rapid updates to the existing documents. Solr and +Elasticsearch are building upon the same technologies. This page enumerates important concepts and implementations of Lucene used by CrateDB. -## Lucene +## Data structures -Lucene offers scalable and high-performance indexing which enables efficient search and -aggregations over documents and rapid updates to the existing documents. Solr and -Elasticsearch are building upon the same technologies. +A single record in Lucene is called "document". -- **Documents** +:Document: - A single record in Lucene is called "document", which is a unit of information for search + A document is a unit of information for search and indexing that contains a set of fields, where each field has a name and value. A Lucene index can store an arbitrary number of documents, with an arbitrary number of different fields. + By default, all fields are indexed, nested or not, but the indexing can be turned + off selectively. -- **Append-only segments** - - A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment, - it is immutable, and built from a set of documents. When new documents are added to the - existing index, they are added to the next segment, while previous segments are never - modified. If the number of segments becomes too large, the system may decide to merge - some segments and discard the freed ones. This way, adding a new document does not require - rebuilding the whole index structure completely. - - CrateDB uses Lucene's default TieredMergePolicy. It merges segments of roughly equal size - and controls the number of segments per "tier" to balance search performance with merge - overhead. Lucene's [TieredMergePolicy] documentation explains in detail how CrateDB's - underlying merge policy decides when to combine segments. - -- **Column store** - - For text values, other than storing the row data as-is (and indexing each value by default), - each value term is stored into a [column-based store] by default, which offers performance - improvements for global aggregations and groupings, and enables efficient ordering, because - the data for one column is packed at one place. - - In CrateDB, the column store is enabled by default and can be disabled only for text fields, - not for other primitive types. Furthermore, CrateDB does not support storing values for - container and geospatial types in the column store. - -## Data structures - -CrateDB uses three main data structures of Lucene: -Inverted indexes for text values, BKD trees for numeric values, and doc values. - -By default, all fields are indexed, nested or not, but the indexing can be turned -off selectively. +CrateDB uses three main data structures of Lucene: Inverted indexes for text values, +BKD trees for numeric values, and doc values. On top of doc values, CrateDB implements +a column store for fast sorting and aggregations. -- **Inverted index** +:Inverted index: The Lucene indexing strategy for text fields relies on a data structure called inverted index, which is defined as a "data structure storing a mapping from content, such as @@ -70,7 +44,7 @@ off selectively. The inverted index enables a very efficient search over textual data. -- **BKD tree** +:BKD tree: To optimize numeric range queries, Lucene uses an implementation of the Block KD (BKD) tree data structure. The BKD tree index structure is suitable for indexing large @@ -83,7 +57,7 @@ off selectively. including fields defined as `TIMESTAMP` types, supporting performant date range queries. -- **Doc values** +:Doc values: Because Lucene's inverted index data structure implementation is not optimal for finding field values by given document identifier, and for performing column-oriented @@ -93,12 +67,45 @@ off selectively. all field values that are not analyzed as strings in a compact column, making it more effective for sorting and aggregations. +:Column store: + + CrateDB implements a {ref}`column store ` + based on doc values in Lucene. + For text values, other than storing the row data as-is (and indexing each value by default), + each value term is stored into a column-based store by default. + + This storage layout improves the performance of sorting, grouping, and aggregations, + by keeping field data for one column packed at one place rather than scattered across documents. + The column store is enabled by default in CrateDB and can be disabled only for text fields. + It does not support container or geographic data types. + +## Storage process + +How CrateDB stores data using Lucene. + +:Append-only segments: + + A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment, + it is immutable, and built from a set of documents. + + When new documents are added to the + existing index, they are added to the next segment, while previous segments are never + modified. If the number of segments becomes too large, the system may decide to merge + some segments and discard the freed ones. This way, adding a new document does not require + rebuilding the whole index structure completely. + + CrateDB uses Lucene's default TieredMergePolicy. It merges segments of roughly equal size + and controls the number of segments per "tier" to balance search performance with merge + overhead. Lucene's [TieredMergePolicy] documentation explains in detail how CrateDB's + underlying merge policy decides when to combine segments. + + ::::{todo} Enable after merging [GH-434: Indexing and storage](https://github.com/crate/cratedb-guide/pull/434). ```md ## Related sections -{ref}`indexing-and-storage` explores the internal workings and data structures +{ref}`indexing-and-storage` illustrates the internal workings and data structures of CrateDB's storage layer in more detail. :::{toctree} @@ -109,5 +116,4 @@ indexing-and-storage :::: -[column-based store]: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/storage.html [TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html From 3c1db13d26f23d39deb8528b1adada1c95019a33 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 00:12:11 +0100 Subject: [PATCH 04/28] Storage: Add information about segment merges and table refreshes --- docs/feature/storage/index.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index c4ab4da3..d2350e94 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -83,6 +83,10 @@ a column store for fast sorting and aggregations. How CrateDB stores data using Lucene. +tldr; CrateDB never needs explicit VACUUMs, manual compactions, or +reindexing. The system maintains itself dynamically, which is a key advantage +for always-on analytics environments where data never stops flowing in. + :Append-only segments: A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment, @@ -94,11 +98,34 @@ How CrateDB stores data using Lucene. some segments and discard the freed ones. This way, adding a new document does not require rebuilding the whole index structure completely. +:Segment merges: + + When new data is inserted into CrateDB, it is written into small, immutable + segments on disk. Over time, these segments are merged into larger ones by + background tasks, balancing I/O load with query performance. + + This process, known as segment merging, achieves three critical optimizations: + - Space compaction: Merging removes deleted or superseded records, freeing disk + space automatically. + - Faster queries: Larger segments reduce index overhead and improve cache efficiency. + - No downtime: Merging occurs transparently, allowing continuous ingestion and querying. + CrateDB uses Lucene's default TieredMergePolicy. It merges segments of roughly equal size and controls the number of segments per "tier" to balance search performance with merge overhead. Lucene's [TieredMergePolicy] documentation explains in detail how CrateDB's underlying merge policy decides when to combine segments. +:Table refreshes: + + CrateDB's refresh mechanism controls how often newly ingested data becomes visible + for querying. Instead of committing every write immediately, which would degrade + throughput, CrateDB batches writes in memory and periodically refreshes data + segments, typically once per second by default. + + This approach strikes a balance between low-latency visibility and high ingestion + performance, allowing users to query the most recent data almost instantly while + maintaining efficient bulk ingestion without overwhelming the storage layer + or exhausting other cluster resources. ::::{todo} Enable after merging [GH-434: Indexing and storage](https://github.com/crate/cratedb-guide/pull/434). From 8d0ae60da71549f997c0c8c7dd8ce301f55cfc09 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 00:38:44 +0100 Subject: [PATCH 05/28] Storage: Implement suggestions by CodeRabbit --- docs/feature/storage/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index d2350e94..c3ebe6b9 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -106,7 +106,7 @@ for always-on analytics environments where data never stops flowing in. This process, known as segment merging, achieves three critical optimizations: - Space compaction: Merging removes deleted or superseded records, freeing disk - space automatically. + space automatically. - Faster queries: Larger segments reduce index overhead and improve cache efficiency. - No downtime: Merging occurs transparently, allowing continuous ingestion and querying. From f65b37e15381b790464d724e3cb399f17c44a086 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 22:59:53 +0100 Subject: [PATCH 06/28] Storage: s/On top of doc values/Based on doc values/ --- docs/feature/storage/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index c3ebe6b9..3baf5798 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -27,7 +27,7 @@ A single record in Lucene is called "document". off selectively. CrateDB uses three main data structures of Lucene: Inverted indexes for text values, -BKD trees for numeric values, and doc values. On top of doc values, CrateDB implements +BKD trees for numeric values, and doc values. Based on doc values, CrateDB implements a column store for fast sorting and aggregations. :Inverted index: From 1bd8e5c4fd66c139b72ff4ca8620f6c32f89ae07 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 23:00:19 +0100 Subject: [PATCH 07/28] Storage: Enable footer cross linking --- docs/feature/storage/index.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 3baf5798..299d0a6a 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -127,9 +127,7 @@ for always-on analytics environments where data never stops flowing in. maintaining efficient bulk ingestion without overwhelming the storage layer or exhausting other cluster resources. -::::{todo} -Enable after merging [GH-434: Indexing and storage](https://github.com/crate/cratedb-guide/pull/434). -```md + ## Related sections {ref}`indexing-and-storage` illustrates the internal workings and data structures @@ -139,8 +137,6 @@ of CrateDB's storage layer in more detail. :hidden: indexing-and-storage ::: -``` -:::: [TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html From 6e615492e154659c8e61847b23c1af655056c9df Mon Sep 17 00:00:00 2001 From: Sebastian Utz Date: Wed, 29 Oct 2025 23:01:24 +0100 Subject: [PATCH 08/28] Storage: Add note about the "recreate tables" topic --- docs/feature/storage/index.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 299d0a6a..80ca4400 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -83,9 +83,13 @@ a column store for fast sorting and aggregations. How CrateDB stores data using Lucene. -tldr; CrateDB never needs explicit VACUUMs, manual compactions, or -reindexing. The system maintains itself dynamically, which is a key advantage -for always-on analytics environments where data never stops flowing in. +tldr; In daily operations, CrateDB never needs explicit VACUUMs, manual +compactions, or reindexing. [^recreate-tables] +The system maintains itself dynamically, which is a key advantage for +always-on analytics environments where data never stops flowing in. + +[^recreate-tables]: While CrateDB is maintenance-free in daily operations, + you will need to [recreate tables] on major version upgrades. :Append-only segments: @@ -139,4 +143,5 @@ indexing-and-storage ::: +[recreate tables]: https://cratedb.com/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated [TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html From a3633f88e27b70162407e33ce0af31dd61d5abd6 Mon Sep 17 00:00:00 2001 From: Sebastian Utz Date: Wed, 29 Oct 2025 23:02:27 +0100 Subject: [PATCH 09/28] Storage: Improve wording "When data is written" --- docs/feature/storage/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 80ca4400..dcab5149 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -104,7 +104,7 @@ always-on analytics environments where data never stops flowing in. :Segment merges: - When new data is inserted into CrateDB, it is written into small, immutable + When data is written into CrateDB, it is written into small, immutable segments on disk. Over time, these segments are merged into larger ones by background tasks, balancing I/O load with query performance. From c1f39fe6238703179b703b836bf76ef169223d69 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 23:04:53 +0100 Subject: [PATCH 10/28] Storage: s/small segments/subsequent segments/ --- docs/feature/storage/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index dcab5149..6845cb71 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -104,7 +104,7 @@ always-on analytics environments where data never stops flowing in. :Segment merges: - When data is written into CrateDB, it is written into small, immutable + When data is written to CrateDB, it is written into subsequent immutable segments on disk. Over time, these segments are merged into larger ones by background tasks, balancing I/O load with query performance. From 6fbef6455187d0c37931bd20f095a591ff9d9a9a Mon Sep 17 00:00:00 2001 From: Sebastian Utz Date: Wed, 29 Oct 2025 23:05:57 +0100 Subject: [PATCH 11/28] Storage: Clarify that refreshes won't happen periodically on idling shards --- docs/feature/storage/index.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 6845cb71..d66cd928 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -123,8 +123,9 @@ always-on analytics environments where data never stops flowing in. CrateDB's refresh mechanism controls how often newly ingested data becomes visible for querying. Instead of committing every write immediately, which would degrade - throughput, CrateDB batches writes in memory and periodically refreshes data - segments, typically once per second by default. + throughput, CrateDB batches writes in memory and refreshes data + segments when needed. For performance reasons, refreshes won't happen on shards + which aren't queried for some time (idling). This approach strikes a balance between low-latency visibility and high ingestion performance, allowing users to query the most recent data almost instantly while From cf8b1548b720e2b3a9903ae0d2d739cc5cf1ef24 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 23:17:41 +0100 Subject: [PATCH 12/28] Storage: Mention `OPTIMIZE TABLE` within section about segment merges --- docs/feature/storage/index.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index d66cd928..910eda3f 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -114,11 +114,17 @@ always-on analytics environments where data never stops flowing in. - Faster queries: Larger segments reduce index overhead and improve cache efficiency. - No downtime: Merging occurs transparently, allowing continuous ingestion and querying. - CrateDB uses Lucene's default TieredMergePolicy. It merges segments of roughly equal size + CrateDB uses Lucene's default TieredMergePolicy for automatically merging segments + in the background. It merges segments of roughly equal size and controls the number of segments per "tier" to balance search performance with merge overhead. Lucene's [TieredMergePolicy] documentation explains in detail how CrateDB's underlying merge policy decides when to combine segments. + You can invoke segment merges manually by using the + {ref}`OPTIMIZE TABLE ` SQL command. + This achieves the best optimization, especially after heavy insert operations. + For example, after initially loading table data from another system. + :Table refreshes: CrateDB's refresh mechanism controls how often newly ingested data becomes visible From c814a33723bb3c40bbb87bed885cf39c39229854 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 23:17:56 +0100 Subject: [PATCH 13/28] Storage: Improve first sentence about segment merges once more --- docs/feature/storage/index.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 910eda3f..44c9b91e 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -105,8 +105,9 @@ always-on analytics environments where data never stops flowing in. :Segment merges: When data is written to CrateDB, it is written into subsequent immutable - segments on disk. Over time, these segments are merged into larger ones by - background tasks, balancing I/O load with query performance. + segments on disk. Over time, to reduce their number, these segments are + merged into larger ones by background tasks, balancing I/O load with + query performance. This process, known as segment merging, achieves three critical optimizations: - Space compaction: Merging removes deleted or superseded records, freeing disk From 0976922921268971ebaccd0aebca72cd948b0b67 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 29 Oct 2025 23:44:53 +0100 Subject: [PATCH 14/28] Storage: "table refreshes" plus `refresh_interval` and `REFRESH TABLE` --- docs/feature/storage/index.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 44c9b91e..646350b1 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -139,6 +139,11 @@ always-on analytics environments where data never stops flowing in. maintaining efficient bulk ingestion without overwhelming the storage layer or exhausting other cluster resources. + CrateDB refreshes tables once per second by default, but this can be configured + on a per-table level by using the {ref}`crate-reference:sql-create-table-refresh-interval` + table parameter. + You can also force writes manually by using the + {ref}`REFRESH TABLE ` SQL command. ## Related sections From 759d8ceef6186bd4f16c438f4f6debaccf9060c1 Mon Sep 17 00:00:00 2001 From: Sebastian Utz Date: Thu, 30 Oct 2025 11:36:58 +0100 Subject: [PATCH 15/28] Storage: Fix and reorganize "column store" section --- docs/feature/storage/index.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 646350b1..fd388331 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -71,13 +71,14 @@ a column store for fast sorting and aggregations. CrateDB implements a {ref}`column store ` based on doc values in Lucene. - For text values, other than storing the row data as-is (and indexing each value by default), - each value term is stored into a column-based store by default. - This storage layout improves the performance of sorting, grouping, and aggregations, by keeping field data for one column packed at one place rather than scattered across documents. - The column store is enabled by default in CrateDB and can be disabled only for text fields. - It does not support container or geographic data types. + + The column store is enabled by default in CrateDB and can optionally be disabled + on a per-field level. It does not support container or geographic data types. + + For all supported value types, other than storing the row data as-is, and indexing + each value by default, each value term is stored into a column-based store by default. ## Storage process From 75100eace3bea6fea0924b212d7fb824e9b86f3b Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 30 Oct 2025 11:50:31 +0100 Subject: [PATCH 16/28] Storage: Implement suggestions by CodeRabbit --- docs/feature/storage/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index fd388331..1bcdb168 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -77,8 +77,8 @@ a column store for fast sorting and aggregations. The column store is enabled by default in CrateDB and can optionally be disabled on a per-field level. It does not support container or geographic data types. - For all supported value types, other than storing the row data as-is, and indexing - each value by default, each value term is stored into a column-based store by default. + For all supported value types, field values are indexed and automatically stored + in the column-based store. ## Storage process From 996a1638e39cb35bbe11714064d233f941cecf4c Mon Sep 17 00:00:00 2001 From: Marios Trivyzas <5058131+matriv@users.noreply.github.com> Date: Thu, 30 Oct 2025 13:04:22 +0100 Subject: [PATCH 17/28] Storage: Improve and reorganize "column store" section once more --- docs/feature/storage/index.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 1bcdb168..56b1ce14 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -74,11 +74,13 @@ a column store for fast sorting and aggregations. This storage layout improves the performance of sorting, grouping, and aggregations, by keeping field data for one column packed at one place rather than scattered across documents. - The column store is enabled by default in CrateDB and can optionally be disabled - on a per-field level. It does not support container or geographic data types. - For all supported value types, field values are indexed and automatically stored - in the column-based store. + in the column-based store. It does not support container or geographic data types. + + The column store is enabled by default in CrateDB and can optionally be disabled + on a per-field level. The purpose of disabling is to reduce storage requirements + and achieve better write performance, when the columnar store is not needed for + those columns. ## Storage process From 20edea4c1771dcd191f04824c0c0aec1278c7fdd Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 30 Oct 2025 14:18:34 +0100 Subject: [PATCH 18/28] Storage: More updates from code review --- docs/feature/storage/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 56b1ce14..c72508a9 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -142,16 +142,16 @@ always-on analytics environments where data never stops flowing in. maintaining efficient bulk ingestion without overwhelming the storage layer or exhausting other cluster resources. - CrateDB refreshes tables once per second by default, but this can be configured + CrateDB refreshes tables once per second by default, however this can be configured on a per-table level by using the {ref}`crate-reference:sql-create-table-refresh-interval` table parameter. - You can also force writes manually by using the + You can also "force a refresh" manually by using the {ref}`REFRESH TABLE ` SQL command. ## Related sections {ref}`indexing-and-storage` illustrates the internal workings and data structures -of CrateDB's storage layer in more detail. +of Lucene in more detail, and how CrateDB's storage layer uses them. :::{toctree} :hidden: From 2f57f352fcea667e4de6ba962d4ccb42a46275e5 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 30 Oct 2025 17:39:43 +0100 Subject: [PATCH 19/28] Storage: Link to reference documentation's "optimization" page -- https://cratedb.com/docs/crate/reference/en/latest/admin/optimization.html --- docs/feature/storage/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index c72508a9..d9dbd86b 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -111,6 +111,7 @@ always-on analytics environments where data never stops flowing in. segments on disk. Over time, to reduce their number, these segments are merged into larger ones by background tasks, balancing I/O load with query performance. + This process is called {ref}`optimization `. This process, known as segment merging, achieves three critical optimizations: - Space compaction: Merging removes deleted or superseded records, freeing disk From 97302ad76e38f260fa83483b16d797c9d1ee78c6 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Thu, 30 Oct 2025 17:39:50 +0100 Subject: [PATCH 20/28] Storage: This and that --- docs/feature/storage/index.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index d9dbd86b..9f1c0590 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -96,6 +96,9 @@ always-on analytics environments where data never stops flowing in. :Append-only segments: + Lucene only appends data to segment files, which means that data written + to the disc will never be mutated. + A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment, it is immutable, and built from a set of documents. @@ -127,8 +130,8 @@ always-on analytics environments where data never stops flowing in. You can invoke segment merges manually by using the {ref}`OPTIMIZE TABLE ` SQL command. - This achieves the best optimization, especially after heavy insert operations. - For example, after initially loading table data from another system. + This achieves the best optimization, especially after heavy insert operations, + for example, after initially loading table data from another system. :Table refreshes: From 99b2c371b68feabe824b4ad8a126d15d7092c46c Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 17:50:36 +0100 Subject: [PATCH 21/28] Storage: Define term "sharded storage" and link to reference manual --- docs/feature/storage/index.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 9f1c0590..7a96ea37 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -94,6 +94,16 @@ always-on analytics environments where data never stops flowing in. [^recreate-tables]: While CrateDB is maintenance-free in daily operations, you will need to [recreate tables] on major version upgrades. +:Sharded storage: + + Every table in CrateDB is sharded, which means that tables are divided + and distributed across the nodes of a cluster. Each shard in CrateDB is + a Lucene index broken down into segments getting stored on the filesystem. + + {ref}`crate-reference:concept-storage-consistency` shares more details + about how storage operations work in sharded and optionally replicated + cluster environments. + :Append-only segments: Lucene only appends data to segment files, which means that data written From 3be466bbd459fa8102c2ac4cd46575a90bcaac50 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 18:25:30 +0100 Subject: [PATCH 22/28] fixup! Storage: This and that --- docs/feature/storage/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 7a96ea37..42401a5b 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -164,14 +164,14 @@ always-on analytics environments where data never stops flowing in. ## Related sections -{ref}`indexing-and-storage` illustrates the internal workings and data structures -of Lucene in more detail, and how CrateDB's storage layer uses them. - :::{toctree} :hidden: indexing-and-storage ::: +{ref}`indexing-and-storage` illustrates the internal workings and data structures +of Lucene in more detail, and how CrateDB's storage layer uses them. + [recreate tables]: https://cratedb.com/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated [TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html From 55229be437473eccad3bd361a5824a4f764b1633 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 18:26:17 +0100 Subject: [PATCH 23/28] Storage: Link to reference documentation's "storage and consistency" -- https://cratedb.com/docs/crate/reference/en/latest/concepts/resiliency.html#storage-and-consistency --- docs/feature/storage/index.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 42401a5b..9fe6725d 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -172,6 +172,11 @@ indexing-and-storage {ref}`indexing-and-storage` illustrates the internal workings and data structures of Lucene in more detail, and how CrateDB's storage layer uses them. +{ref}`crate-reference:concept-resiliency-consistency` describes the positive +high-availability and performance effects of the eventual consistency model +implemented by CrateDB's storage and cluster subsystems, and also +what this means for application developers. + [recreate tables]: https://cratedb.com/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated [TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html From 9d8a808cf018e6f09c0ce9c9a7ef929cd3382578 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 20:52:30 +0100 Subject: [PATCH 24/28] Storage: Compress "sharded storage" and "eventual consistency" Suggestions by CodeRabbit. --- docs/feature/storage/index.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 9fe6725d..a397db19 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -96,13 +96,11 @@ always-on analytics environments where data never stops flowing in. :Sharded storage: - Every table in CrateDB is sharded, which means that tables are divided - and distributed across the nodes of a cluster. Each shard in CrateDB is - a Lucene index broken down into segments getting stored on the filesystem. + CrateDB shards every table, dividing and distributing it across cluster nodes. + Each shard is a Lucene index composed of segments stored on the filesystem. - {ref}`crate-reference:concept-storage-consistency` shares more details - about how storage operations work in sharded and optionally replicated - cluster environments. + {ref}`crate-reference:concept-storage-consistency` explains storage operations + in sharded and replicated cluster environments. :Append-only segments: @@ -172,10 +170,9 @@ indexing-and-storage {ref}`indexing-and-storage` illustrates the internal workings and data structures of Lucene in more detail, and how CrateDB's storage layer uses them. -{ref}`crate-reference:concept-resiliency-consistency` describes the positive -high-availability and performance effects of the eventual consistency model -implemented by CrateDB's storage and cluster subsystems, and also -what this means for application developers. +{ref}`crate-reference:concept-resiliency-consistency` explains how eventual consistency +in CrateDB's storage and cluster subsystems delivers high availability and performance, +and what this means for application developers. [recreate tables]: https://cratedb.com/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated From 2755fb6a499ba816cab1c4c5c23774ea39236966 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 21:29:21 +0100 Subject: [PATCH 25/28] Storage: Compress "segment merges", based on suggestions by CodeRabbit --- docs/feature/storage/index.md | 36 +++++++++++++++-------------------- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index a397db19..c2da20ff 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -119,27 +119,21 @@ always-on analytics environments where data never stops flowing in. :Segment merges: When data is written to CrateDB, it is written into subsequent immutable - segments on disk. Over time, to reduce their number, these segments are - merged into larger ones by background tasks, balancing I/O load with - query performance. - This process is called {ref}`optimization `. - - This process, known as segment merging, achieves three critical optimizations: - - Space compaction: Merging removes deleted or superseded records, freeing disk - space automatically. - - Faster queries: Larger segments reduce index overhead and improve cache efficiency. - - No downtime: Merging occurs transparently, allowing continuous ingestion and querying. - - CrateDB uses Lucene's default TieredMergePolicy for automatically merging segments - in the background. It merges segments of roughly equal size - and controls the number of segments per "tier" to balance search performance with merge - overhead. Lucene's [TieredMergePolicy] documentation explains in detail how CrateDB's - underlying merge policy decides when to combine segments. - - You can invoke segment merges manually by using the - {ref}`OPTIMIZE TABLE ` SQL command. - This achieves the best optimization, especially after heavy insert operations, - for example, after initially loading table data from another system. + segments on disk. + Background tasks merge immutable segments into larger ones over time + to reduce their number, which reduces index overhead and improves cache + efficiency. + While merging, the process also eliminates deleted records, effectively + freeing disk space. + + The merge process occurs transparently, using Lucene's TieredMergePolicy + to merge segments of roughly equal sizes without interrupting ingestion + or queries, while balancing query performance with merge I/O overhead. + See Lucene's [TieredMergePolicy] documentation for details. + + You can invoke merges manually using {ref}`OPTIMIZE TABLE `, + to achieve {ref}`optimization ` especially after + heavy insert operations. :Table refreshes: From cb62aacc22cb6293a7f56578db0adbc92ef45b9f Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 21:45:57 +0100 Subject: [PATCH 26/28] fixup! Storage: This and that --- docs/feature/storage/index.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index c2da20ff..717ed9e6 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -84,7 +84,9 @@ a column store for fast sorting and aggregations. ## Storage process -How CrateDB stores data using Lucene. +The storage techniques used in CrateDB have been the foundation of big data +architectures for over a decade, powering search engines, social +networks, and analytics platforms at a massive scale. tldr; In daily operations, CrateDB never needs explicit VACUUMs, manual compactions, or reindexing. [^recreate-tables] From 16b83fe7595a394b55af7612778ac0b5b71bf127 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Fri, 31 Oct 2025 21:46:45 +0100 Subject: [PATCH 27/28] Storage: Expand "sharded storage", based on suggestions by CodeRabbit --- docs/feature/storage/index.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 717ed9e6..2b4cec1a 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -98,6 +98,10 @@ always-on analytics environments where data never stops flowing in. :Sharded storage: + Sharding distributes data horizontally across multiple nodes, enabling + systems to handle datasets far larger than any single machine can store + or process. + CrateDB shards every table, dividing and distributing it across cluster nodes. Each shard is a Lucene index composed of segments stored on the filesystem. From 0077a6e02d5a38ae6ac685431dcb30adc1960be3 Mon Sep 17 00:00:00 2001 From: Marios Trivyzas <5058131+matriv@users.noreply.github.com> Date: Mon, 3 Nov 2025 12:11:41 +0100 Subject: [PATCH 28/28] Storage: Mention relationship between Lucene document and CrateDB row --- docs/feature/storage/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/feature/storage/index.md b/docs/feature/storage/index.md index 2b4cec1a..3443e72e 100644 --- a/docs/feature/storage/index.md +++ b/docs/feature/storage/index.md @@ -16,7 +16,8 @@ This page enumerates important concepts and implementations of Lucene used by Cr ## Data structures -A single record in Lucene is called "document". +A single record in Lucene is called "document", +which is used to store a single table row in CrateDB. :Document: