Skip to content

Sharding: Clarify conflicting shard-per-CPU guidance in sharding documentation #380

@coderabbitai

Description

@coderabbitai

Description

The sharding performance guide contains conflicting guidance regarding the shard-per-CPU ratio that needs to be reconciled:

Line 63 (in "Shard size vs. number of shards" section):

In large clusters, this often means fewer shards than total CPU cores, as larger shards can still be processed efficiently by multiple CPU cores during query execution.

Lines 137-140 (in "Avoid under-allocation" section):

To increase the chances that a query can be parallelized and distributed maximally, there should be at least as many shards for a table than there are CPUs in the cluster.

These statements conflict and leave operators without a clear rule.

Proposed Solution

Reconcile the sections by:

  • Distinguishing the "starting point" baseline (aim for #shards >= #CPUs per table) from scenarios where running fewer shards is acceptable
  • Clarifying when large clusters with larger shard sizes (5-50 GB) can use fewer shards than total CPU cores
  • Adding bridging text that explains the conditions and trade-offs that justify running fewer, larger shards
  • Ensuring both paragraphs tell a consistent story

References

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions