Why does rabbitmq's consumption experience jagged pauses #14567

ponponon · 2025-09-18T03:01:00Z

ponponon
Sep 18, 2025

Describe the bug

Whether it's the publisher or the consumer, the rate drops to zero at fixed intervals. Is this due to the RabbitMQ server pausing for garbage collection?

If observing a single queue leads to an unreliable conclusion about GC pauses, the same pattern holds globally. The entire RabbitMQ instance seems to pause approximately every 30 seconds. Why is this happening? Is RabbitMQ under excessive load? How can we specifically analyze and troubleshoot whether these GC pauses are caused by high load or other factors?

Reproduction steps

When using RabbitMQ 3.12.14 with a sufficiently large number of queues and consumers, the issue is 100% reproducible.

Expected behavior

I believe both sending consumption and consumption tasks should be smooth. Why does the consumption curve show metrics resetting to zero at fixed intervals? I want to understand the cause. If this is an abnormal phenomenon, I want to know how to prevent it.

Additional context

I deployed it using Debian 10 + Docker.

The machine configuration is 24 cores and 48GB RAM, with top-tier cloud SSD storage.

I'm using an Alibaba Cloud ECS instance with the specification ecs.hfc6.6xlarge. This machine has 24 cores (vCPUs) and 48 GiB of memory.

Two disks are present: a 40GB system disk supporting 2280 IOPS and a 200GB data disk supporting 11800 IOPS.

I mention this to demonstrate that the machine I'm using is sufficiently powerful.

No anomalies were observed in the monitoring panel of the Alibaba Cloud ECS instance.

Information about my Linux operating system

root@szbq-rabbitmq-52:/opt/rabbitmq-server# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

My Docker version information

root@szbq-rabbitmq-52:/opt/rabbitmq-server# docker version
Client: Docker Engine - Community
 Version:           20.10.20
 API version:       1.41
 Go version:        go1.18.7
 Git commit:        9fdeb9c
 Built:             Tue Oct 18 18:20:36 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.20
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       03df974
  Built:            Tue Oct 18 18:18:26 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.8
  GitCommit:        9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0

docker-compose.yaml

services:
  rabbitmq3-management:
    restart: always
    container_name: rabbitmq3-management
    image: rabbitmq:3.12.14-management
    hostname: rabbitmq3-management-standalone
    logging:
      driver: json-file
      options:
        max-size: "100m"
        max-file: "1"
    environment:
      - RABBITMQ_DEFAULT_USER=xxxx
      - RABBITMQ_DEFAULT_PASS=xxxx
    volumes:
      - "/xxxdata/rabbitmq:/var/lib/rabbitmq"
      - "./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf"
    ports:
      - "5672:5672"
      - "15672:15672"
      - "15692:15692"

rabbitmq.conf

vm_memory_high_watermark.relative = 0.85

My CPU usage is as follows:

root@szbq-rabbitmq-52:/opt/rabbitmq-server# sar 1 10
Linux 4.19.0-17-amd64 (szbq-rabbitmq-52)        09/23/2025      _x86_64_        (24 CPU)

06:02:46 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
06:02:47 PM     all     49.41      0.00     11.42      2.63      0.00     36.54
06:02:48 PM     all     57.86      0.00     14.21      3.26      0.00     24.67
06:02:49 PM     all     50.62      0.00     14.37      3.81      0.00     31.21
06:02:50 PM     all     46.65      0.00     13.36      3.35      0.00     36.64
06:02:51 PM     all     50.82      0.00     12.78      3.21      0.00     33.19
06:02:52 PM     all     48.58      0.00     14.05      3.55      0.00     33.82
06:02:53 PM     all     51.76      0.00     13.25      3.09      0.00     31.91
06:02:54 PM     all     51.82      0.00     15.52      4.16      0.00     28.50
06:02:55 PM     all     46.00      0.00     15.28      4.50      0.00     34.22
06:02:56 PM     all     51.12      0.00     14.02      3.56      0.00     31.30
Average:        all     50.47      0.00     13.83      3.51      0.00     32.19

Disk utilization is as follows. Due to numerous read and write operations, disk read/write activity remains quite intensive.

root@szbq-rabbitmq-52:~# iostat -x -d vdb 1
Linux 4.19.0-17-amd64 (szbq-rabbitmq-52)        09/23/2025      _x86_64_        (24 CPU)

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb             31.14 12269.90   1938.31 134936.44     0.00 16144.08   0.00  56.82    1.19    0.27   3.25    62.25    11.00   0.07  88.24

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 13075.00      0.00 122144.00     0.00 14471.00   0.00  52.53    0.00    0.19   2.25     0.00     9.34   0.06  73.60

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 11924.00      0.00 123160.00     0.00 16937.00   0.00  58.68    0.00    0.19   2.42     0.00    10.33   0.07  81.60

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 11042.00      0.00 118188.00     0.00 13752.00   0.00  55.47    0.00    0.24   2.58     0.00    10.70   0.07  80.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 11594.00      0.00 115864.00     0.00 15493.00   0.00  57.20    0.00    0.19   2.05     0.00     9.99   0.07  80.40

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 12204.00      0.00 163384.00     0.00 18747.00   0.00  60.57    0.00    0.31   3.60     0.00    13.39   0.07  80.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 13239.00      0.00 135356.00     0.00 19292.00   0.00  59.30    0.00    0.20   2.26     0.00    10.22   0.06  78.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 11757.00      0.00 122884.00     0.00 17103.00   0.00  59.26    0.00    0.20   2.23     0.00    10.45   0.07  84.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 13287.00      0.00 129608.00     0.00 16788.00   0.00  55.82    0.00    0.19   2.38     0.00     9.75   0.06  77.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 12638.00      0.00 127536.00     0.00 17965.00   0.00  58.70    0.00    0.19   2.53     0.00    10.09   0.07  85.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 13874.00      0.00 169020.00     0.00 20173.00   0.00  59.25    0.00    0.29   4.07     0.00    12.18   0.06  81.60

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              0.00 13910.00      0.00 150768.00     0.00 18611.00   0.00  57.23    0.00    0.22   2.98     0.00    10.84   0.06  83.60

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdb              1.00 18049.00      4.00 175260.00     0.00 23895.00   0.00  56.97    0.00    0.19   3.12     4.00     9.71   0.05  91.20

Although this doesn't affect my use of RabbitMQ, these jagged edges have piqued my curiosity.

If I want to troubleshoot this issue, where should I start? I'd appreciate some guidance. Thank you.

Is there any more detailed information I can provide?

It wasn't intentional on my part to use an outdated version, but around June 2025 (around that time), I upgraded RabbitMQ from 3.12.14 to 3.13.7 and encountered the “Restarting crashed queue” issue, which directly caused a production incident. Furthermore, when I attempted to upgrade from 3.13.7 to 4.0.2, I still encountered the “Restarting crashed queue” issue (not an in-place upgrade, but a newly created queue). This has made me hesitant to upgrade beyond 3.12.14 for now.

Of course, the “Restarting crashed queue” issue is a separate topic. If you're interested, I can start a new discussion specifically about this problem. @kjnilsson

I use classic queues exclusively.

Answered by kjnilsson

Sep 18, 2025

3.12.4 is out of community support, you need to upgrade to 4.2.

That said the graph may just be a side effect of how metrics are calculated. You need to look at the throughout rate of your consumers and see if it matches what you see in the management UI. Upgrade to 4.2 and see if it still occurs.

View full answer

kjnilsson · 2025-09-18T06:15:28Z

kjnilsson
Sep 18, 2025
Maintainer

3.12.4 is out of community support, you need to upgrade to 4.2.

That said the graph may just be a side effect of how metrics are calculated. You need to look at the throughout rate of your consumers and see if it matches what you see in the management UI. Upgrade to 4.2 and see if it still occurs.

1 reply

mkuratczyk Sep 18, 2025
Maintainer

just a small correction: 4.2 has not been released yet. :) 4.1 is the latest available version.

If 4.1 behaves the same way, we still need more information - at the very least, what type of queues do you use?

lukebakken · 2025-09-18T14:11:09Z

lukebakken
Sep 18, 2025
Maintainer

@ponponon - what do you expect the RabbitMQ maintainers to do with what little information you provide, exactly? Do you expect them to rush to set up an environment, try to GUESS how you're using RabbitMQ, and report back to you, all for free? You're not even using a supported version of RabbitMQ.

If you want to get free support for your issue, I suggest you provide enough information to reproduce what you report.

First, reproduce your issue in your environment using the latest version of RabbitMQ and Erlang. If you see the same behavior, provide a git repository with the complete source code to start producers and consumers that mimics your workload and reproduces what you observe.

0 replies

michaelklishin · 2025-09-18T14:25:26Z

michaelklishin
Sep 18, 2025
Maintainer

@ponponon do you expect us to guess what your consumers do or do not do (like do not acknowledge deliveries in a timely manner or use a suitable prefetch value)? I'm afraid our small team cannot afford guessing, guessing is a very very time consuming approach to troubleshooting distributed infrastructure.

Is this due to the RabbitMQ server pausing for garbage collection?

The Erlang runtime does not suffer from "stop the world" pauses caused by GC because there is no global GC, every Erlang process (a connection, a channel or session, or queue or stream replica) has an independent heap and their garbage collections do not affect other processes.

Yes, there is a shared reference counted heap for larger binaries but its GC is not "stop the world" for the entire system.

As any heavy PerfTest user would confirm, when a stop-the-world Java GC in a consumer or producer process happens, you can usually tell by a drop in publishing or delivery/delivery acknowledgement metrics, even though RabbitMQ was not paused for GC.

One scenario where RabbitMQ is guaranteed to stop deliveries is when a consumer is delivered as many messages as its channel's prefetch, which by definition means that RabbitMQ should not deliver any more until some outstanding deliveries are acknowledged.

0 replies

michaelklishin · 2025-09-18T14:35:01Z

michaelklishin
Sep 18, 2025
Maintainer

How can we specifically analyze and troubleshoot whether these GC pauses are caused by high load or other factors?

By using monitoring data, ideally with a full set of Grafana dashboards (it can be inter-node connection congestion if the messages are large), and by asking the node how does it spend its CPU/scheduler time.

If this node has 1 CPU core, then a surge of activity in any part of the system (e.g. on a particular connection) inevitably can take CPU scheduler time from queues or channels (that serialize deliveries to be sent).

With an installation so old (it has reached EOL without any exceptions), I cannot rule out that these periodic background GC settings that were relevant for some workloads years ago could be enabled. They force a minor GC run for every single process in the system.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does rabbitmq's consumption experience jagged pauses #14567

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why does rabbitmq's consumption experience jagged pauses #14567

Uh oh!

Uh oh!

ponponon Sep 18, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 4 comments · 1 reply

Uh oh!

kjnilsson Sep 18, 2025 Maintainer

Uh oh!

mkuratczyk Sep 18, 2025 Maintainer

Uh oh!

Uh oh!

lukebakken Sep 18, 2025 Maintainer

Uh oh!

Uh oh!

michaelklishin Sep 18, 2025 Maintainer

Uh oh!

michaelklishin Sep 18, 2025 Maintainer

ponponon
Sep 18, 2025

Replies: 4 comments 1 reply

kjnilsson
Sep 18, 2025
Maintainer

mkuratczyk Sep 18, 2025
Maintainer

lukebakken
Sep 18, 2025
Maintainer

michaelklishin
Sep 18, 2025
Maintainer

michaelklishin
Sep 18, 2025
Maintainer