Skip to content
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e1bf05a
Storage: Improve guidance
amotl Oct 28, 2025
3b9ccd6
Storage: Extend segments with information about merge policy
amotl Oct 28, 2025
290f76d
Storage: Reorganize sections
amotl Oct 28, 2025
3c1db13
Storage: Add information about segment merges and table refreshes
amotl Oct 28, 2025
8d0ae60
Storage: Implement suggestions by CodeRabbit
amotl Oct 28, 2025
f65b37e
Storage: s/On top of doc values/Based on doc values/
amotl Oct 29, 2025
1bd8e5c
Storage: Enable footer cross linking
amotl Oct 29, 2025
6e61549
Storage: Add note about the "recreate tables" topic
seut Oct 29, 2025
a3633f8
Storage: Improve wording "When data is written"
seut Oct 29, 2025
c1f39fe
Storage: s/small segments/subsequent segments/
amotl Oct 29, 2025
6fbef64
Storage: Clarify that refreshes won't happen periodically on idling s…
seut Oct 29, 2025
cf8b154
Storage: Mention `OPTIMIZE TABLE` within section about segment merges
amotl Oct 29, 2025
c814a33
Storage: Improve first sentence about segment merges once more
amotl Oct 29, 2025
0976922
Storage: "table refreshes" plus `refresh_interval` and `REFRESH TABLE`
amotl Oct 29, 2025
759d8ce
Storage: Fix and reorganize "column store" section
seut Oct 30, 2025
75100ea
Storage: Implement suggestions by CodeRabbit
amotl Oct 30, 2025
996a163
Storage: Improve and reorganize "column store" section once more
matriv Oct 30, 2025
20edea4
Storage: More updates from code review
amotl Oct 30, 2025
2f57f35
Storage: Link to reference documentation's "optimization" page
amotl Oct 30, 2025
97302ad
Storage: This and that
amotl Oct 30, 2025
99b2c37
Storage: Define term "sharded storage" and link to reference manual
amotl Oct 31, 2025
3be466b
fixup! Storage: This and that
amotl Oct 31, 2025
55229be
Storage: Link to reference documentation's "storage and consistency"
amotl Oct 31, 2025
9d8a808
Storage: Compress "sharded storage" and "eventual consistency"
amotl Oct 31, 2025
2755fb6
Storage: Compress "segment merges", based on suggestions by CodeRabbit
amotl Oct 31, 2025
cb62aac
fixup! Storage: This and that
amotl Oct 31, 2025
16b83fe
Storage: Expand "sharded storage", based on suggestions by CodeRabbit
amotl Oct 31, 2025
0077a6e
Storage: Mention relationship between Lucene document and CrateDB row
matriv Nov 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 118 additions & 41 deletions docs/feature/storage/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,50 +9,29 @@
The CrateDB storage layer is based on Lucene.
:::

By default, all fields are indexed,
nested or not, but the indexing can be turned off selectively.

This page enumerates some concepts of Lucene. The article {ref}`indexing-and-storage`
goes into more details by exploring its internal workings.

## Lucene

Lucene offers scalable and high-performance indexing which enables efficient search and
Lucene offers scalable and high-performance indexing, which enables efficient search and
aggregations over documents and rapid updates to the existing documents. Solr and
Elasticsearch are building upon the same technologies.
This page enumerates important concepts and implementations of Lucene used by CrateDB.

- **Documents**

A single record in Lucene is called "document", which is a unit of information for search
and indexing that contains a set of fields, where each field has a name and value. A Lucene
index can store an arbitrary number of documents, with an arbitrary number of different fields.

- **Append-only segments**

A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment,
it is immutable, and built from a set of documents. When new documents are added to the
existing index, they are added to the next segment, while previous segments are never
modified. If the number of segments becomes too large, the system may decide to merge
some segments and discard the freed ones. This way, adding a new document does not require
rebuilding the whole index structure completely.

- **Column store**
## Data structures

For text values, other than storing the row data as-is (and indexing each value by default),
each value term is stored into a [column-based store] by default, which offers performance
improvements for global aggregations and groupings, and enables efficient ordering, because
the data for one column is packed at one place.
A single record in Lucene is called "document",
which is used to store a single table row in CrateDB.

In CrateDB, the column store is enabled by default and can be disabled only for text fields,
not for other primitive types. Furthermore, CrateDB does not support storing values for
container and geospatial types in the column store.
:Document:

## Data structures
A document is a unit of information for search
and indexing that contains a set of fields, where each field has a name and value. A Lucene
index can store an arbitrary number of documents, with an arbitrary number of different fields.
By default, all fields are indexed, nested or not, but the indexing can be turned
off selectively.

CrateDB uses three main data structures of Lucene:
Inverted indexes for text values, BKD trees for numeric values, and doc values.
CrateDB uses three main data structures of Lucene: Inverted indexes for text values,
BKD trees for numeric values, and doc values. Based on doc values, CrateDB implements
a column store for fast sorting and aggregations.

- **Inverted index**
:Inverted index:

The Lucene indexing strategy for text fields relies on a data structure called inverted
index, which is defined as a "data structure storing a mapping from content, such as
Expand All @@ -66,7 +45,7 @@ Inverted indexes for text values, BKD trees for numeric values, and doc values.

The inverted index enables a very efficient search over textual data.

- **BKD tree**
:BKD tree:

To optimize numeric range queries, Lucene uses an implementation of the Block KD (BKD)
tree data structure. The BKD tree index structure is suitable for indexing large
Expand All @@ -79,7 +58,7 @@ Inverted indexes for text values, BKD trees for numeric values, and doc values.
including fields defined as `TIMESTAMP` types, supporting performant date range
queries.

- **Doc values**
:Doc values:

Because Lucene's inverted index data structure implementation is not optimal for
finding field values by given document identifier, and for performing column-oriented
Expand All @@ -89,15 +68,113 @@ Inverted indexes for text values, BKD trees for numeric values, and doc values.
all field values that are not analyzed as strings in a compact column, making it more
effective for sorting and aggregations.

## See also
:Column store:

CrateDB implements a {ref}`column store <crate-reference:ddl-storage-columnstore>`
based on doc values in Lucene.
This storage layout improves the performance of sorting, grouping, and aggregations,
by keeping field data for one column packed at one place rather than scattered across documents.

For all supported value types, field values are indexed and automatically stored
in the column-based store. It does not support container or geographic data types.

The column store is enabled by default in CrateDB and can optionally be disabled
on a per-field level. The purpose of disabling is to reduce storage requirements
and achieve better write performance, when the columnar store is not needed for
those columns.

## Storage process

The storage techniques used in CrateDB have been the foundation of big data
architectures for over a decade, powering search engines, social
networks, and analytics platforms at a massive scale.

tldr; In daily operations, CrateDB never needs explicit VACUUMs, manual
compactions, or reindexing. [^recreate-tables]
The system maintains itself dynamically, which is a key advantage for
always-on analytics environments where data never stops flowing in.

- {ref}`indexing-and-storage`
[^recreate-tables]: While CrateDB is maintenance-free in daily operations,
you will need to [recreate tables] on major version upgrades.

:Sharded storage:

Sharding distributes data horizontally across multiple nodes, enabling
systems to handle datasets far larger than any single machine can store
or process.

CrateDB shards every table, dividing and distributing it across cluster nodes.
Each shard is a Lucene index composed of segments stored on the filesystem.

{ref}`crate-reference:concept-storage-consistency` explains storage operations
in sharded and replicated cluster environments.

:Append-only segments:

Lucene only appends data to segment files, which means that data written
to the disc will never be mutated.

A Lucene index is composed of one or more sub-indexes. A sub-index is called a segment,
it is immutable, and built from a set of documents.

When new documents are added to the
existing index, they are added to the next segment, while previous segments are never
modified. If the number of segments becomes too large, the system may decide to merge
some segments and discard the freed ones. This way, adding a new document does not require
rebuilding the whole index structure completely.

:Segment merges:

When data is written to CrateDB, it is written into subsequent immutable
segments on disk.
Background tasks merge immutable segments into larger ones over time
to reduce their number, which reduces index overhead and improves cache
efficiency.
While merging, the process also eliminates deleted records, effectively
freeing disk space.

The merge process occurs transparently, using Lucene's TieredMergePolicy
to merge segments of roughly equal sizes without interrupting ingestion
or queries, while balancing query performance with merge I/O overhead.
See Lucene's [TieredMergePolicy] documentation for details.

You can invoke merges manually using {ref}`OPTIMIZE TABLE <crate-reference:sql-optimize>`,
to achieve {ref}`optimization <crate-reference:optimize>` especially after
heavy insert operations.

:Table refreshes:

CrateDB's refresh mechanism controls how often newly ingested data becomes visible
for querying. Instead of committing every write immediately, which would degrade
throughput, CrateDB batches writes in memory and refreshes data
segments when needed. For performance reasons, refreshes won't happen on shards
which aren't queried for some time (idling).

This approach strikes a balance between low-latency visibility and high ingestion
performance, allowing users to query the most recent data almost instantly while
maintaining efficient bulk ingestion without overwhelming the storage layer
or exhausting other cluster resources.

CrateDB refreshes tables once per second by default, however this can be configured
on a per-table level by using the {ref}`crate-reference:sql-create-table-refresh-interval`
table parameter.
You can also "force a refresh" manually by using the
{ref}`REFRESH TABLE <crate-reference:sql-refresh>` SQL command.

## Related sections

:::{toctree}
:hidden:
indexing-and-storage
:::

{ref}`indexing-and-storage` illustrates the internal workings and data structures
of Lucene in more detail, and how CrateDB's storage layer uses them.

{ref}`crate-reference:concept-resiliency-consistency` explains how eventual consistency
in CrateDB's storage and cluster subsystems delivers high availability and performance,
and what this means for application developers.


[column-based store]: https://cratedb.com/docs/crate/reference/en/latest/general/ddl/storage.html
[recreate tables]: https://cratedb.com/docs/crate/reference/en/latest/admin/system-information.html#tables-need-to-be-recreated
[TieredMergePolicy]: https://lucene.apache.org/core/9_12_1/core/org/apache/lucene/index/TieredMergePolicy.html