[Suggestion] Granular disk alarms #14590
Replies: 3 comments
-
The most granular that the limit could be would be to block publishing per-queue. That would make it possible to block publishing to all queues residing on a node (including queues with replicas on the node) if that node individually runs out of space. Or we could block publishing to three of seven nodes in a large cluster if a three-replica stream for example is taking all of the space on those nodes. This fine granularity is good for publishing availability but it is tricky to implement. Connections would need to track all queues they have published to and an alarm would need to list all affected queues when a disk gets below its limit. If very many queues are affected by a disk filling up this could be very expensive. It could also impose some cost on the connection when declaring a new queue since it would need to be checked to see if it is covered by any active alarm. |
Beta Was this translation helpful? Give feedback.
-
I think the most useful level of granularity would be per-queue-type. This would be a nice balance of improving publishing availability vs. complexity in determining which connections to block. Implementation-wise, we could use the disk monitoring mechanism from |
Beta Was this translation helpful? Give feedback.
-
Selective blocking of publishers is a massive can of worms. We can consider making this particular resource alarm node-local. Unfortunately, in particular in the case of streams, it is not uncommon to see one node run out of disk space with publishers connected to other nodes. So the idea of making this more "volume aware" and making it possible to move QQ and stream data on separate volumes — something IIRC we have made possible for Ra/QQs — is the most practical option. I have a feeling there will be pushback on the idea of making queue types responsible for this. The queue type API is already non-trivial in scope. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
RabbitMQ series
4.1.x
Operating system (distribution) used
macOS
How is RabbitMQ deployed?
Other
What would you like to suggest for a future version of RabbitMQ?
Currently a disk alarm fires when the available space on the logical disk of
rabbit:data_dir/0
drops below the configured limit. Falling below the limit on any node of the cluster blocks all publishing on all nodes. It would be useful to have a disk alarm that is more granular and could block publishing to a subset of queues so that one logical disk filling up would not block all publishing.I tried out a prototype for this in #14086 which added limits per queue type.
Beta Was this translation helpful? Give feedback.
All reactions