[Questions] Messages stuck in unack after consumers scaled down, TPS dropped to 0 #14611

VinayakSomvanshi · 2025-09-25T11:08:31Z

VinayakSomvanshi
Sep 25, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.7

Erlang version used

27.3.x

Operating system (distribution) used

RHEL 9

How is RabbitMQ deployed?

RPM package

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

Runtime
-------
OS PID: 2895258
OS: Linux
Uptime (seconds): 46347
Is under maintenance?: false
RabbitMQ version: 4.0.7
Node name: rabbit@pnb-rabbitmq-oepay-2
Erlang configuration: Erlang/OTP 27 [erts-15.2.3] [source] [64-bit] [smp:24:24] [ds:24:24:10] [async-threads:1] [jit:ns]
Crypto library: OpenSSL 3.2.4 6 Jun 2024
Erlang processes: 7591 used, 1048576 limit
Scheduler run queue: 8
Cluster heartbeat timeout (net_ticktime): 60

Plugins
-------
Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:
  rabbitmq_prometheus
  rabbitmq_shovel_management
  rabbitmq_top
  rabbitmq_consistent_hash_exchange
  accept
  rabbitmq_shovel
  amqp10_client
  prometheus
  rabbitmq_management
  rabbitmq_management_agent
  rabbitmq_web_dispatch
  amqp_client
  cowboy
  cowlib
  oauth2_client
  jose

Data Directory
--------------
Node data directory: /appsmcon/rabbitmq/rabbit@pnb-rabbitmq-oepay-2
Raft data directory: /appsmcon/rabbitmq/rabbit@pnb-rabbitmq-oepay-2/quorum/rabbit@pnb-rabbitmq-oepay-2

Config Files
------------
/etc/rabbitmq/rabbitmq.conf

Log Files
---------
/appsmcon/rabbitmq/log/rabbit@pnb-rabbitmq-oepay-2.log
<stdout>

Alarms
------
(none)

Tags
----
(none)

Memory
------
Total memory used: 4.3926 GB
Calculation strategy: rss
Memory high watermark setting: 0.8 of available memory, computed to: 41.9098 GB

Breakdown:
  Allocated unused: 1.5873 GB (34.51%)
  Quorum queue procs: 1.5834 GB (34.48%)
  Binary: 0.912 GB (18.83%)
  Plugins: 0.1158 GB (2.52%)
  Connection channels: 0.1027 GB (2.23%)
  Other_system: 0.0883 GB (1.91%)
  Mgt_db: 0.075 GB (1.63%)
  Other procs: 0.0649 GB (1.48%)
  Code: 0.0236 GB (0.52%)
  Other_ets: 0.0196 GB (0.43%)
  Metrics: 0.0125 GB (0.28%)
  Connection_other: 0.0059 GB (0.22%)
  Quorum_ets: 0.0057 GB (0.12%)
  Connection_readers: 0.0054 GB (0.12%)
  Mnesia: 0.0046 GB (0.1%)
  Connection_writers: 0.0042 GB (0.1%)
  Queue_procs: 0.0022 GB (0.05%)
  Msg_index: 0.0013 GB (0.03%)
  Metadata_store: 0.0001 GB (0%)
  Metadata_store_ets: 0.0 GB (0%)
  Quorum_queue_dlx_procs: 0.0 GB (0%)
  Stream_queue_procs: 0.0 GB (0%)
  Stream_queue_replica_reader_procs: 0.0 GB (0%)
  Stream_queue_coordinator_procs: 0.0 GB (0%)
  Reserved_unallocated: 0.0 GB (0.0%)

File Descriptors
----------------
Total: 0
Limit: 999503

Free Disk Space
---------------
Low free disk space watermark: 0.05 GB
Free disk space: 1088.3577 GB

Totals
------
Connection count: 138
Queue count: 3329
Virtual host count: 1

Listeners
---------
Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 2 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

loopback_users = none  
vm_memory_high_watermark.relative = 0.8  
queue_leader_locator = min-master

Steps to deploy RabbitMQ cluster

We run RabbitMQ as a 3-node bare-metal cluster on RHEL 9 virtual machines, fronted by a load balancer. Each VM has 24 CPUs, 48 GiB RAM, and 1 TB storage. The load balancer exposes ports 5672, 15672, and 15692.

Steps to reproduce the behavior in question

The issue is intermittent, but observed pattern is:

Message throughput (TPS for delivery + ack) suddenly dropped to 0, while incoming rate was still steady.
Messages started to back up in queues.
Scaled down pods (consumers) → expected unacknowledged messages to move back to ready, but some messages remained stuck in unack even though no consumers were connected.
Waited for unack messages to clear — but they didn’t.
Restarted RabbitMQ nodes one by one (maintaining quorum).
After restart: inbound traffic resumed, messages cleared, but the database writes queue is still lagging in processing backlog.

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

Not Applicable

Application code

Not Applicable

Kubernetes deployment file

Not Applicable

What problem are you trying to solve?

During message processing in RabbitMQ, the TPS for delivery/ack suddenly drops to 0 while incoming messages continue. This leads to a backlog. When consumers are scaled down, some messages remain stuck in unack instead of moving back to ready. Even with no active consumers, RabbitMQ did not release those messages. Only after restarting RabbitMQ services one by one did the messages resume flowing, but database write queues still had pending backlog.

What could be the underlying issue that causes for messages to remain stuck in unack after consumers are scaled down (no consumers present)? And TPS (delivery/ack rate) to drop to 0 while incoming continues?

Answered by lukebakken

Sep 25, 2025

TPS (delivery/ack rate) to drop to 0 while incoming continues?

This sounds like you have a bug in your consumers. I've seen cases exactly like this where users think there must be something wrong with RabbitMQ when in fact their consumers process a message, hit an exception, and then just stop, because they don't handle the exception correctly.

What could be the underlying issue that causes for messages to remain stuck in unack after consumers are scaled down (no consumers present)?

I seriously doubt that you had 0 consumers. RabbitMQ's behavior since day one is to re-enqueue unacked messages when consumers disappear.

As @michaelklishin said, the version of RabbitMQ you're using isn't…

View full answer

kjnilsson · 2025-09-25T14:57:39Z

kjnilsson
Sep 25, 2025
Maintainer

Even with no active consumers,

Are you confident all consuming connections have been closed? There may be clues in your broker logs or in the management UI.

0 replies

michaelklishin · 2025-09-25T15:29:52Z

michaelklishin
Sep 25, 2025
Maintainer

During message processing in RabbitMQ, the TPS for delivery/ack suddenly drops to 0 while incoming messages continue.
This leads to a backlog

That's no evidence of a bug in RabbitMQ. We observe consumers that do not acknowledge deliveries all the time and had to introduce a protection mechanism a few years ago because for quorum queues at the moment, such unconfirmed messages become a problem beyond a certain number of unconfirmed messages.

It can be a matter of consumer connections not being detected as closed immediately, too.

Automatic requeueing should eventually take care of it.

0 replies

michaelklishin · 2025-09-25T15:30:16Z

michaelklishin
Sep 25, 2025
Maintainer

RabbitMQ 4.0.x is out of community support. Please upgrade to 4.1.4, for now, that's all the help you will get on 4.0.x.

0 replies

lukebakken · 2025-09-25T15:49:20Z

lukebakken
Sep 25, 2025
Maintainer

TPS (delivery/ack rate) to drop to 0 while incoming continues?

This sounds like you have a bug in your consumers. I've seen cases exactly like this where users think there must be something wrong with RabbitMQ when in fact their consumers process a message, hit an exception, and then just stop, because they don't handle the exception correctly.

What could be the underlying issue that causes for messages to remain stuck in unack after consumers are scaled down (no consumers present)?

I seriously doubt that you had 0 consumers. RabbitMQ's behavior since day one is to re-enqueue unacked messages when consumers disappear.

As @michaelklishin said, the version of RabbitMQ you're using isn't eligible for free community support. If you'd like to receive free support from the RabbitMQ maintainers, you must do the following -

Reproduce the issue using the latest released version of RabbitMQ (4.1.4)
Provide a script or some other easy means for the maintainers to reproduce this issue.
If a reproduction can't be automated, you must provide explicit steps to reproduce the issue that works every time.

Paid support for RabbitMQ is available as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Questions] Messages stuck in unack after consumers scaled down, TPS dropped to 0 #14611

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Questions] Messages stuck in unack after consumers scaled down, TPS dropped to 0 #14611

Uh oh!

VinayakSomvanshi Sep 25, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 4 comments

Uh oh!

kjnilsson Sep 25, 2025 Maintainer

Uh oh!

Uh oh!

michaelklishin Sep 25, 2025 Maintainer

Uh oh!

Uh oh!

michaelklishin Sep 25, 2025 Maintainer

Uh oh!

lukebakken Sep 25, 2025 Maintainer

VinayakSomvanshi
Sep 25, 2025

kjnilsson
Sep 25, 2025
Maintainer

michaelklishin
Sep 25, 2025
Maintainer

michaelklishin
Sep 25, 2025
Maintainer

lukebakken
Sep 25, 2025
Maintainer