-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
maintenanceC'est la vie.C'est la vie.
Description
Description
The sharding performance guide contains conflicting guidance regarding the shard-per-CPU ratio that needs to be reconciled:
Line 63 (in "Shard size vs. number of shards" section):
In large clusters, this often means fewer shards than total CPU cores, as larger shards can still be processed efficiently by multiple CPU cores during query execution.
Lines 137-140 (in "Avoid under-allocation" section):
To increase the chances that a query can be parallelized and distributed maximally, there should be at least as many shards for a table than there are CPUs in the cluster.
These statements conflict and leave operators without a clear rule.
Proposed Solution
Reconcile the sections by:
- Distinguishing the "starting point" baseline (aim for #shards >= #CPUs per table) from scenarios where running fewer shards is acceptable
- Clarifying when large clusters with larger shard sizes (5-50 GB) can use fewer shards than total CPU cores
- Adding bridging text that explains the conditions and trade-offs that justify running fewer, larger shards
- Ensuring both paragraphs tell a consistent story
References
- PR: Sharding strategy and recommendations #355
- Comment: Sharding strategy and recommendations #355 (comment)
- File:
docs/performance/sharding.md - Requested by: @amotl
Metadata
Metadata
Assignees
Labels
maintenanceC'est la vie.C'est la vie.