-
-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Labels
Description
We are unable to drive more throughput with larger worker pool. Our current peak yield comes from 24x32 worker pools. However, taskbroker is not saturating disk, cpu or network doing this workload. My theory is that we're limited by contention on sqlite, and that if we had additional sqlite databases we could unlock additional broker throughput.
- Refactor/extract a trait from
InflightActivationStore. This trait can be used as a type by the consumer, upkeep and grpc activities. - Implement an sharded store that spreads inflight task storage across multiple sqlite databases based on a configuration value for the number of databases.
- During startup taskbroker can create the required number of connections, databases and apply migrations to each shard.
Change to RPC
SetTaskStatuscan use the activation_id, to locate the correct shard to update.GetTaskwill need to round robin between each of the shards to ensure even consumption
Changes to consumer
- As activations are consumed from Kafka, we can choose which shard an activation is stored in by
activation.id % num_dbs. - Free space calculation will have to take all shards into account.
Changes to metrics
Having multiple databases, will impact several metrics we collect from taskbroker.
- upkeep summary metrics would need to be aggregated across each shard.
- status count queries + metrics will need to be aggregated against each shard.
Desired result
Ideally with additional sqlite shards we are able to get more throughput from a single broker and support larger worker pools per broker. This should also help utilize broker CPU better.