From 05eefcb2b4b3424faba7bfd71d9704dfb28bd7ff Mon Sep 17 00:00:00 2001 From: Ian Driver Date: Thu, 2 Oct 2025 04:17:23 +0100 Subject: [PATCH] Fix #4396: Add timeout to establish_connection to prevent infinite loop (#5104) **Which issue(s) this PR fixes**: Fixes #4396 **What this PR does / why we need it**: Adds timeout mechanism to `establish_connection` method to prevent infinite loop when handshake protocol gets stuck. In unstable network environments with proxy components, if connection drops during handshake after TLS establishment, Fluentd gets stuck in infinite loop causing logs to stop being flushed. This fix uses existing `hard_timeout` configuration to break the loop, disable problematic nodes, and maintain log flow through healthy nodes. **Docs Changes**: None required - uses existing `hard_timeout` configuration parameter. **Release Note**: Fix infinite loop in out_forward handshake protocol that could cause logs to stop being flushed in unstable network environments. Signed-off-by: Ian Driver Co-authored-by: Ian Driver Signed-off-by: Shizuo Fujita --- lib/fluent/plugin/out_forward.rb | 10 ++++++++++ test/plugin/test_out_forward.rb | 23 +++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/lib/fluent/plugin/out_forward.rb b/lib/fluent/plugin/out_forward.rb index 9a07acbe22..a0aca082a8 100644 --- a/lib/fluent/plugin/out_forward.rb +++ b/lib/fluent/plugin/out_forward.rb @@ -610,7 +610,17 @@ def verify_connection end def establish_connection(sock, ri) + start_time = Fluent::Clock.now + timeout = @sender.hard_timeout + while ri.state != :established + # Check for timeout to prevent infinite loop + if Fluent::Clock.now - start_time > timeout + @log.warn "handshake timeout after #{timeout}s", host: @host, port: @port + disable! + break + end + begin # TODO: On Ruby 2.2 or earlier, read_nonblock doesn't work expectedly. # We need rewrite around here using new socket/server plugin helper. diff --git a/test/plugin/test_out_forward.rb b/test/plugin/test_out_forward.rb index cb35a31743..6209a3b1ae 100644 --- a/test/plugin/test_out_forward.rb +++ b/test/plugin/test_out_forward.rb @@ -1347,4 +1347,27 @@ def plugin_id_for_test? end end end + + test 'establish_connection_timeout' do + @d = d = create_driver(%[ + hard_timeout 1 + + host #{TARGET_HOST} + port #{@target_port} + + ]) + + node = d.instance.nodes.first + mock_sock = flexmock('socket') + mock_sock.should_receive(:read_nonblock).with(512).and_return('').at_least.once + + ri = Fluent::Plugin::ForwardOutput::ConnectionManager::RequestInfo.new(:helo) + + assert_true node.available? + node.establish_connection(mock_sock, ri) + assert_false node.available? + + logs = d.logs + assert{ logs.any?{|log| log.include?('handshake timeout after 1.0s') } } + end end