[Questions] Messages stuck in unack after consumers scaled down, TPS dropped to 0 #14611
Replies: 4 comments
-
Are you confident all consuming connections have been closed? There may be clues in your broker logs or in the management UI. |
Beta Was this translation helpful? Give feedback.
-
That's no evidence of a bug in RabbitMQ. We observe consumers that do not acknowledge deliveries all the time and had to introduce a protection mechanism a few years ago because for quorum queues at the moment, such unconfirmed messages become a problem beyond a certain number of unconfirmed messages. It can be a matter of consumer connections not being detected as closed immediately, too. Automatic requeueing should eventually take care of it. |
Beta Was this translation helpful? Give feedback.
-
RabbitMQ |
Beta Was this translation helpful? Give feedback.
-
This sounds like you have a bug in your consumers. I've seen cases exactly like this where users think there must be something wrong with RabbitMQ when in fact their consumers process a message, hit an exception, and then just stop, because they don't handle the exception correctly.
I seriously doubt that you had 0 consumers. RabbitMQ's behavior since day one is to re-enqueue unacked messages when consumers disappear. As @michaelklishin said, the version of RabbitMQ you're using isn't eligible for free community support. If you'd like to receive free support from the RabbitMQ maintainers, you must do the following -
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Community Support Policy
RabbitMQ version used
4.0.7
Erlang version used
27.3.x
Operating system (distribution) used
RHEL 9
How is RabbitMQ deployed?
RPM package
rabbitmq-diagnostics status output
See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
Logs from node 1 (with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 2 (if applicable, with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 3 (if applicable, with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
rabbitmq.conf
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ cluster
We run RabbitMQ as a 3-node bare-metal cluster on RHEL 9 virtual machines, fronted by a load balancer. Each VM has 24 CPUs, 48 GiB RAM, and 1 TB storage. The load balancer exposes ports 5672, 15672, and 15692.
Steps to reproduce the behavior in question
The issue is intermittent, but observed pattern is:
advanced.config
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application code
Kubernetes deployment file
What problem are you trying to solve?
During message processing in RabbitMQ, the TPS for delivery/ack suddenly drops to 0 while incoming messages continue. This leads to a backlog. When consumers are scaled down, some messages remain stuck in unack instead of moving back to ready. Even with no active consumers, RabbitMQ did not release those messages. Only after restarting RabbitMQ services one by one did the messages resume flowing, but database write queues still had pending backlog.
What could be the underlying issue that causes for messages to remain stuck in unack after consumers are scaled down (no consumers present)? And TPS (delivery/ack rate) to drop to 0 while incoming continues?
Beta Was this translation helpful? Give feedback.
All reactions