@@ -6,24 +6,197 @@ orphan: true
66
77# Clustering
88
9- :::{todo} Implement.
109
11- About scalability through partitioning, sharding, and replication.
12- Also about cross cluster replication.
10+ :::{include} /_ include/links.md
1311:::
12+ :::{include} /_ include/styles.html
13+ :::
14+
15+ <style >
16+ .field-list dd {
17+ margin-bottom : 1em !important ;
18+ }
19+ .field-list p {
20+ margin-bottom : 0.5em ;
21+ }
22+ </style >
23+
24+
25+ :::::{grid}
26+ :padding: 0
27+
28+ ::::{grid-item}
29+ :class: rubric-slim
30+ :columns: auto 9 9 9
1431
32+ ** CrateDB provides scalability through partitioning, sharding, and replication.**
1533
16- :::{seealso}
17- ** Domains:**
18- [ ] ( #metrics-store ) •
19- [ ] ( #analytics ) •
20- [ ] ( #industrial ) •
21- [ ] ( #timeseries ) •
22- [ ] ( #machine-learning )
34+ :::{rubric} Overview
35+ :::
36+ CrateDB uses a shared nothing architecture to form high-availability, resilient
37+ database clusters with minimal effort of configuration, effectively implementing
38+ a distributed SQL database.
39+
40+ :::{rubric} About
41+ :::
42+ CrateDB relies on Lucene for storage and inherits components from Elasticsearch/
43+ OpenSearch for cluster consensus. Fundamental concepts of CrateDB are familiar
44+ to Elasticsearch users, because the fundamental implementation is actually the same.
2345
24- ** Product:**
25- [ Relational Database]
46+ :::{rubric} Details
2647:::
2748
49+ Sharding and partitioning are techniques used to distribute data evenly across
50+ multiple nodes in a cluster, ensuring data scalability, availability, and
51+ performance.
52+
53+ Replication can be applied to increase redundancy, which reduces the chance of
54+ data loss, and to improve read performance.
55+
56+ :Sharding:
57+
58+ In CrateDB, tables are split into a configured number of shards. Then, the
59+ shards are distributed across multiple nodes of the database cluster.
60+ Each shard in CrateDB is stored in a dedicated Lucene index.
61+
62+ You can think of shards as a self-contained part of a table, that includes
63+ both a subset of records and the corresponding indexing structures.
64+
65+ Figuring out how many shards to use for your tables requires you to think about
66+ the type of data you are processing, the types of queries you are running, and
67+ the type of hardware you are using.
68+
69+ :Partitioning:
70+
71+ CrateDB also supports splitting up data across another dimension with
72+ partitioning.
73+ Tables can be partitioned by defining partition columns.
74+ You can think of a partition as a set of shards.
75+
76+ - Partitioned tables optimize access efficiency when querying data, because only
77+ a subset of data needs to be addressed and acquired.
78+ - Each partition can be backed up and restored individually, for efficient operations.
79+ - Tables allow to change the number of shards even after creation time for future
80+ partitions. This feature enables you to start out with few shards per partition,
81+ and scale up the number of shards for later partitions once traffic
82+ and ingest rates increase over the lifetime of your application or system.
83+
84+ :Replication:
85+
86+ You can configure CrateDB to replicate tables. When you configure replication,
87+ CrateDB will ensure that every table shard has one or more copies available
88+ at all times.
89+
90+ Replication can also improve read performance because any increase in the
91+ number of shards distributed across a cluster also increases the
92+ opportunities for CrateDB to parallelize query execution across multiple nodes.
93+
94+ ::::
95+
96+ ::::{grid-item}
97+ :class: rubric-slim
98+ :columns: auto 3 3 3
99+
100+ :::{rubric} Concepts
101+ :::
102+ - {ref}` crate-reference:concept-clustering `
103+ - {ref}` crate-reference:concept-storage-consistency `
104+ - {ref}` crate-reference:concept-resiliency `
105+
106+ :::{rubric} Reference Manual
107+ :::
108+ - {ref}` crate-reference:ddl-sharding `
109+ - {ref}` crate-reference:partitioned-tables `
110+ - {ref}` Partition columns <gloss-partition-column> `
111+ - {ref}` crate-reference:ddl-replication `
112+
113+ {tags-primary}` Clustering `
114+ {tags-primary}` Sharding `
115+ {tags-primary}` Partitioning `
116+ {tags-primary}` Replication `
117+ ::::
118+
119+ :::::
120+
121+
122+ ## Synopsis
123+ With a monthly throughput of 300 GB, partitioning your table by month,
124+ and using six shards, each shard will manage 50 GB of data, which is
125+ within the recommended size range (5 - 100 GB).
126+
127+ Through replication, the table will store three copies of your data,
128+ in order to reduce the chance of permanent data loss.
129+ ``` sql
130+ CREATE TABLE timeseries_table (
131+ ts TIMESTAMP ,
132+ val DOUBLE PRECISION ,
133+ part GENERATED ALWAYS AS date_trunc(' month' , ts)
134+ )
135+ CLUSTERED INTO 6 SHARDS
136+ PARTITIONED BY (part)
137+ WITH (number_of_replicas = 2 );
138+ ```
139+
140+
141+ ## Learn
142+ Individual characteristics and shapes of data need different sharding and
143+ partitioning strategies. Learn about the details of shard allocation, that
144+ will support you to choose the right strategy for your data and your most
145+ prominent types of workloads.
146+
147+ ::::{grid} 2 2 2 2
148+ :padding: 0
149+
150+ :::{grid-item-card}
151+ :link : sharding-partitioning
152+ :link-type: ref
153+ :link-alt: Sharding and Partitioning
154+ :padding: 3
155+ :class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
156+ :class-body: sd-text-center2 sd-fs2-5
157+ :class-footer: text-smaller
158+ Sharding and Partitioning
159+ ^^^
160+ - Introduction to the concepts of sharding and partitioning.
161+ - Learn how to choose a strategy that fits your needs.
162+ +++
163+ {material-outlined}` lightbulb;1.8em `
164+ An in-depth guide on how to configure sharding and partitioning,
165+ presenting best practices and examples.
166+ :::
167+
168+ :::{grid-item-card}
169+ :link : sharding-performance
170+ :link-type: ref
171+ :link-alt: Sharding and Partitioning
172+ :padding: 3
173+ :class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
174+ :class-body: sd-text-center2 sd-fs2-5
175+ :class-footer: text-smaller
176+ Sharding Performance Guide
177+ ^^^
178+ - Optimising for query performance.
179+ - Optimising for ingestion performance.
180+ +++
181+ {material-outlined}` lightbulb;1.8em `
182+ Guidelines about balancing your strategy to yield the best performance for your workloads.
183+ :::
184+
185+ :::{grid-item-card}
186+ :link : https://community.cratedb.com/t/sharding-and-partitioning-guide-for-time-series-data/737
187+ :link-alt: Sharding and partitioning guide for time-series data
188+ :padding: 3
189+ :class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold
190+ :class-body: sd-text-center2 sd-fs2-5
191+ :class-footer: text-smaller
192+ Sharding and partitioning guide for time-series data
193+ ^^^
194+ A hands-on walkthrough to support you with building a sharding and partitioning
195+ strategy for your time series data.
196+ +++
197+ {material-outlined}` lightbulb;1.8em `
198+ Includes details about partitioning, sharding, and replication. Gives valuable
199+ advises about relevant topic matters.
200+ :::
28201
29- [ Relational Database ] : https://cratedb.com/solutions/relational-database
202+ ::::
0 commit comments