[Suggestion] Granular disk alarms #14590

the-mikedavis · 2025-09-23T14:41:07Z

the-mikedavis
Sep 23, 2025
Maintainer

RabbitMQ series

4.1.x

Operating system (distribution) used

macOS

How is RabbitMQ deployed?

Other

What would you like to suggest for a future version of RabbitMQ?

Currently a disk alarm fires when the available space on the logical disk of rabbit:data_dir/0 drops below the configured limit. Falling below the limit on any node of the cluster blocks all publishing on all nodes. It would be useful to have a disk alarm that is more granular and could block publishing to a subset of queues so that one logical disk filling up would not block all publishing.

I tried out a prototype for this in #14086 which added limits per queue type.

the-mikedavis · 2025-09-23T14:41:27Z

the-mikedavis
Sep 23, 2025
Maintainer Author

The most granular that the limit could be would be to block publishing per-queue. That would make it possible to block publishing to all queues residing on a node (including queues with replicas on the node) if that node individually runs out of space. Or we could block publishing to three of seven nodes in a large cluster if a three-replica stream for example is taking all of the space on those nodes.

This fine granularity is good for publishing availability but it is tricky to implement. Connections would need to track all queues they have published to and an alarm would need to list all affected queues when a disk gets below its limit. If very many queues are affected by a disk filling up this could be very expensive. It could also impose some cost on the connection when declaring a new queue since it would need to be checked to see if it is covered by any active alarm.

0 replies

the-mikedavis · 2025-09-23T14:42:06Z

the-mikedavis
Sep 23, 2025
Maintainer Author

I think the most useful level of granularity would be per-queue-type. This would be a nice balance of improving publishing availability vs. complexity in determining which connections to block.

Implementation-wise, we could use the disk monitoring mechanism from rabbit_disk_monitor / disksup (from OTP) which basically shells out to df on Unix. So you would mount different devices or logical partitions for osiris:data_dir/0 and ra_env:data_dir/0, for example, and then configure separate limits for each mount point. Usage would be periodically queried for all disks with disksup:get_disk_info/0. When one logical disk is lower than the limit, a rabbit_queue_type callback would determine which queue types are affected (probably by using disksup:get_disk_info/1 on the queue type's data directory/directories) and the disk monitor would set an alarm for all affected queue types. Connections would only need to track which queue types have been published to know if an alarm applies.

0 replies

michaelklishin · 2025-09-23T16:12:54Z

michaelklishin
Sep 23, 2025
Maintainer

Selective blocking of publishers is a massive can of worms. We can consider making this particular resource alarm node-local.

Unfortunately, in particular in the case of streams, it is not uncommon to see one node run out of disk space with publishers connected to other nodes.

So the idea of making this more "volume aware" and making it possible to move QQ and stream data on separate volumes — something IIRC we have made possible for Ra/QQs — is the most practical option.

I have a feeling there will be pushback on the idea of making queue types responsible for this. The queue type API is already non-trivial in scope.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Suggestion] Granular disk alarms #14590

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Suggestion] Granular disk alarms #14590

Uh oh!

the-mikedavis Sep 23, 2025 Maintainer

RabbitMQ series

Operating system (distribution) used

How is RabbitMQ deployed?

What would you like to suggest for a future version of RabbitMQ?

Replies: 3 comments

Uh oh!

the-mikedavis Sep 23, 2025 Maintainer Author

Uh oh!

the-mikedavis Sep 23, 2025 Maintainer Author

Uh oh!

michaelklishin Sep 23, 2025 Maintainer

the-mikedavis
Sep 23, 2025
Maintainer

the-mikedavis
Sep 23, 2025
Maintainer Author

the-mikedavis
Sep 23, 2025
Maintainer Author

michaelklishin
Sep 23, 2025
Maintainer