Topology recovery after switching from a node to another: channel error, queue no found #14564

hyfara · 2025-09-17T14:42:01Z

hyfara
Sep 17, 2025

Describe the bug

Hi,

I have an error with the Topology Recovery on network fails using multiples nodes.

I'm trying to successfully recover the connection after switching from one node to another when the first node is shutdown.

RabbitMQ version: 4.1.4
Erlang version 27.3.4.3
Client: Java client (Java 17.0.16)
Client library version: amqp-client 5.25.0

Reproduction steps

Open a connection using client A to node 1
Shutdown node 1
Client A will try to automatically recover and will try to connect to the node 2
the following exception will be thrown:

WARN :c.s.s.a.l.AMQPShutdownListener:AMQP Connection XXX.XX.XXX.XXX:5673: AMQP Channel has shutdown: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'queue-11987d44-d9a9-4a76-b189-a515210a79b7' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'queue-11987d44-d9a9-4a76-b189-a515210a79b7' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
        at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:528)
        at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:349)
        at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:193)
        at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:125)
        at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:761)
        at com.rabbitmq.client.impl.AMQConnection.access$400(AMQConnection.java:48)
        at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:688)
        at java.base/java.lang.Thread.run(Thread.java:840)

After that exception, the library will still try to recover the topology and fail with the following exceptions:

ERROR:c.r.c.i.ForgivingExceptionHandler:AMQP Connection XXX.XX.XXX.XXX:5673: Caught an exception when recovering topology Caught an exception while recovering queue queue-11987d44-d9a9-4a76-b189-a515210a79b7: null
com.rabbitmq.client.TopologyRecoveryException: Caught an exception while recovering queue queue-11987d44-d9a9-4a76-b189-a515210a79b7: null
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverQueue(AutorecoveringConnection.java:787)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverTopology(AutorecoveringConnection.java:726)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.beginAutomaticRecovery(AutorecoveringConnection.java:602)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.lambda$addAutomaticRecoveryListener$3(AutorecoveringConnection.java:524)
        at com.rabbitmq.client.impl.AMQConnection.notifyRecoveryCanBeginListeners(AMQConnection.java:839)
        at com.rabbitmq.client.impl.AMQConnection.doFinalShutdown(AMQConnection.java:816)
        at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:700)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: null
        at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:140)
        at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:136)
        at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:158)
        at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:989)
        at com.rabbitmq.client.impl.ChannelN.queueDeclare(ChannelN.java:47)
        at com.rabbitmq.client.impl.recovery.RecordedQueue.recover(RecordedQueue.java:60)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.lambda$internalRecoverQueue$13(AutorecoveringConnection.java:808)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.wrapRetryIfNecessary(AutorecoveringConnection.java:914)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.internalRecoverQueue(AutorecoveringConnection.java:807)
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverQueue(AutorecoveringConnection.java:783)
        ... 7 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'queue-11987d44-d9a9-4a76-b189-a515210a79b7' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
        at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)
        at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
        at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:552)
        at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:316)
        at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:152)
        ... 14 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - queue 'queue-11987d44-d9a9-4a76-b189-a515210a79b7' in vhost '/' process is stopped by supervisor, class-id=50, method-id=10)
        at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:528)
        at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:349)
        at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:193)
        at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:125)
        at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:761)
        at com.rabbitmq.client.impl.AMQConnection.access$400(AMQConnection.java:48)
        at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:688)
        ... 1 common frames omitted

Expected behavior

Behavior wanted: when the client A is disconnected from the node 1, it will automatically reconnect to the node 2 and then recover the topology.

What happens: when the client A is trying to recover the channel it fail with a channel error and the reason for it is that the queue is not found.

Additional context

I have the following questions:

Is there a bug in the automatic recovery ?
How should I solve my current issue ?

Regards.

Answered by michaelklishin

Sep 17, 2025

@hyfara this is an ages old problem with client connection recovery and non-replicated queues being moved around when nodes stop.

The behavior will vary depending on queue properties, e.g. exclusive queues will be deleted concurrently with clients trying re-declare them on different nodes. Exclusive server-named queues won't have any issues.

The solution is to not use non-replicated queues with well-known names, so, either use a replicated queue type or use transient exclusive server-named queues.

RabbitMQ cannot know what clients will do next, so this is a fundamental conflict between anything that RabbitMQ does and what clients do at the same time. But replicated queues (plus channels a…

View full answer

michaelklishin · 2025-09-17T16:02:14Z

michaelklishin
Sep 17, 2025
Maintainer

@hyfara this is an ages old problem with client connection recovery and non-replicated queues being moved around when nodes stop.

The behavior will vary depending on queue properties, e.g. exclusive queues will be deleted concurrently with clients trying re-declare them on different nodes. Exclusive server-named queues won't have any issues.

The solution is to not use non-replicated queues with well-known names, so, either use a replicated queue type or use transient exclusive server-named queues.

RabbitMQ cannot know what clients will do next, so this is a fundamental conflict between anything that RabbitMQ does and what clients do at the same time. But replicated queues (plus channels and sessions) handle this scenario, and in the case of exclusive server-named queues, the uniqueness of the name means that there won't be any conflicts.

Oh, and please never file issues for questions in the repos that have discussions enabled.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Topology recovery after switching from a node to another: channel error, queue no found #14564

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Topology recovery after switching from a node to another: channel error, queue no found #14564

Uh oh!

Uh oh!

hyfara Sep 17, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 1 comment

Uh oh!

michaelklishin Sep 17, 2025 Maintainer

hyfara
Sep 17, 2025

michaelklishin
Sep 17, 2025
Maintainer