Skip to content

Conversation

@yuichiro-naito
Copy link

@yuichiro-naito yuichiro-naito commented Nov 17, 2025

Recreate the event loop after daemonizing on BSD platforms.
Because kqueue(2) is not inherited by the child process.

Before reopening, we need to close event channel sockets that were initially created, so as not to leak the sockets. However, the libmonkey interface mk_event_channel_destroy tries to delete sockets from the kqueue. It will fail after daemonizing. We have to call mk_event_channel_destroy before daemonizing.

Fixes #11170


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Exposes explicit controls to manage the event-processing lifecycle.
  • Improvements

    • More consistent cleanup of event-processing resources for improved stability.
    • Daemon-mode on BSD-family systems now reinitializes event processing after daemonization to reduce runtime issues.

@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Walkthrough

Two new exported APIs, flb_create_event_loop() and flb_destroy_event_loop(), are added and used to centralize event-loop creation/teardown. The event loop is destroyed before daemonizing on BSD-family systems and recreated after fork to avoid kqueue inheritance issues.

Changes

Cohort / File(s) Change Summary
Header declarations
include/fluent-bit/flb_lib.h
Added exported declarations: int flb_create_event_loop(flb_ctx_t *ctx); and int flb_destroy_event_loop(flb_ctx_t *ctx);.
Event loop lifecycle
src/flb_lib.c
Implemented flb_create_event_loop() and flb_destroy_event_loop() with allocation, channel setup/teardown, error cleanup; refactored flb_create() to call the new creator and rely on the new teardown path.
Daemonization (BSD)
src/fluent-bit.c
On BSD variants destroy the event loop before daemonizing and recreate it after fork; log and trigger config flush on recreation failure.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Lib as libfluent-bit
    participant OS as Kernel (BSD)

    rect rgba(220,240,255,0.25)
    Note over App,Lib: Startup
    App->>Lib: flb_create()
    Lib->>Lib: flb_create_event_loop(ctx)
    Lib->>OS: mk_event_loop_create / mk_event_channel_create
    end

    rect rgba(255,235,220,0.25)
    Note over App,Lib: Daemonize (BSD)
    App->>Lib: flb_main_run(daemon)
    Lib->>Lib: flb_destroy_event_loop(ctx)
    Lib->>OS: fork()
    OS->>OS: child created (kqueue not inherited)
    end

    rect rgba(220,255,235,0.25)
    Note over Lib,App: Post-fork recovery
    Lib->>Lib: flb_create_event_loop(ctx)
    Lib->>OS: mk_event_loop_create / mk_event_channel_create
    Lib->>App: ready / signal handling restored
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review error-cleanup and free paths in flb_create_event_loop() / flb_destroy_event_loop() for leaks or double-free.
  • Verify BSD-only conditional flow in src/fluent-bit.c and ensure no regressions on non-BSD platforms.
  • Check integration with flb_create() and daemonization for race conditions around signaling.

Suggested reviewers

  • edsiper

Poem

🐇 I munched a kqueue before the fork,

Pipes reset, no more strange dork,
I stitch the loop back in the child,
Now signals wake and logs run mild. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding event loop recreation after daemonizing on BSD platforms.
Linked Issues check ✅ Passed The PR implements the required fix for issue #11170 by extracting event loop creation/destruction into separate APIs and calling them around daemonization on BSD platforms.
Out of Scope Changes check ✅ Passed All changes directly address the linked issue: new event loop APIs and BSD-specific daemonization fix. No unrelated modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73d20d0 and 4d12928.

📒 Files selected for processing (3)
  • include/fluent-bit/flb_lib.h (1 hunks)
  • src/flb_lib.c (2 hunks)
  • src/fluent-bit.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
include/fluent-bit/flb_lib.h (1)
src/flb_lib.c (2)
  • flb_create_event_loop (143-180)
  • flb_destroy_event_loop (182-205)
src/flb_lib.c (4)
lib/monkey/mk_core/mk_event.c (4)
  • mk_event_loop_create (61-88)
  • mk_event_channel_create (181-190)
  • mk_event_channel_destroy (193-202)
  • mk_event_loop_destroy (91-96)
src/flb_config.c (1)
  • flb_config_exit (488-672)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
lib/monkey/include/monkey/mk_core/mk_event.h (1)
  • MK_EVENT_ZERO (135-138)
src/fluent-bit.c (2)
src/flb_lib.c (2)
  • flb_destroy_event_loop (182-205)
  • flb_create_event_loop (143-180)
src/flb_utils.c (2)
  • flb_utils_set_daemon (178-209)
  • flb_utils_error (69-158)

Comment on lines 182 to 218
int flb_destroy_event_loop(flb_ctx_t *ctx)
{
int ret;
struct flb_config *config;

if (ctx == NULL)
return 0;

config = ctx->config;
ret = mk_event_channel_destroy(config->ch_evl,
config->ch_notif[0],
config->ch_notif[1],
ctx->event_channel);
if (ret != 0) {
/* make sure to close file descriptors */
close(config->ch_notif[0]);
close(config->ch_notif[1]);
}

free(ctx->event_channel);

mk_event_loop_destroy(ctx->event_loop);
return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing NULL checks and pointer hygiene issues.

  1. Missing NULL check for config: Line 190 dereferences ctx->config without checking if it's NULL.

  2. Missing NULL checks before freeing: Lines 201 and 203 free ctx->event_channel and destroy ctx->event_loop without checking if they're NULL. While this may work in practice, defensive programming requires these checks.

  3. Pointers not nulled after free: After freeing ctx->event_channel and destroying ctx->event_loop, the pointers are not set to NULL. If this function is called twice, it could lead to use-after-free or double-free.

  4. Inconsistent memory management (nitpick): Line 201 uses free() instead of flb_free(), which is inconsistent with the rest of the codebase.

Suggested fixes:

 int flb_destroy_event_loop(flb_ctx_t *ctx)
 {
     int ret;
     struct flb_config *config;

-    if (ctx == NULL)
+    if (ctx == NULL || ctx->config == NULL) {
         return 0;
+    }

     config = ctx->config;
-    ret = mk_event_channel_destroy(config->ch_evl,
-                                   config->ch_notif[0],
-                                   config->ch_notif[1],
-                                   ctx->event_channel);
-    if (ret != 0) {
-        /* make sure to close file descriptors */
-        close(config->ch_notif[0]);
-        close(config->ch_notif[1]);
+
+    if (ctx->event_channel != NULL) {
+        ret = mk_event_channel_destroy(config->ch_evl,
+                                       config->ch_notif[0],
+                                       config->ch_notif[1],
+                                       ctx->event_channel);
+        if (ret != 0) {
+            /* make sure to close file descriptors */
+            close(config->ch_notif[0]);
+            close(config->ch_notif[1]);
+        }
+        flb_free(ctx->event_channel);
+        ctx->event_channel = NULL;
     }

-    free(ctx->event_channel);
-
-    mk_event_loop_destroy(ctx->event_loop);
+    if (ctx->event_loop != NULL) {
+        mk_event_loop_destroy(ctx->event_loop);
+        ctx->event_loop = NULL;
+    }
+
     return 0;
 }
🤖 Prompt for AI Agents
In src/flb_lib.c around lines 182 to 205, add defensive NULL checks and pointer
hygiene: verify ctx->config is not NULL before using it; only call
mk_event_channel_destroy and close config->ch_notif[0/1] if config and the fds
are valid (non-negative); check ctx->event_channel and ctx->event_loop for NULL
before freeing/destroying them; after freeing/destroying, set ctx->event_channel
and ctx->event_loop to NULL (and reset any closed fd members to -1) to avoid
double-free/use-after-free; and replace the direct free() call with the project
allocator flb_free() for consistency.

Comment on lines 1418 to 1429
#if defined(__FreeBSD__) || defined(__NetBSD__) || \
defined(__OpenBSD__) || defined(__DragonFly__)
flb_destroy_event_loop(ctx);
#endif
flb_utils_set_daemon(config);
#if defined(__FreeBSD__) || defined(__NetBSD__) || \
defined(__OpenBSD__) || defined(__DragonFly__)
if (flb_create_event_loop(ctx) != 0)
flb_utils_error(FLB_ERR_CFG_FLUSH);
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect error code for event loop creation failure.

Line 1426 uses FLB_ERR_CFG_FLUSH which maps to "invalid flush value" in flb_utils_error(). This is misleading for an event loop creation failure. Consider logging a specific error message before calling flb_utils_error(), or define a new error code for this scenario.

Example:

 #if defined(__FreeBSD__) || defined(__NetBSD__) || \
     defined(__OpenBSD__) || defined(__DragonFly__)
         if (flb_create_event_loop(ctx) != 0)
+            flb_error("[daemon] failed to recreate event loop after daemonizing");
             flb_utils_error(FLB_ERR_CFG_FLUSH);
 #endif

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/fluent-bit.c around lines 1418 to 1427, the error path for event loop
creation uses FLB_ERR_CFG_FLUSH (which maps to "invalid flush value") and is
therefore misleading; replace this with a clear event-loop-specific failure
handling: either log a specific error message (e.g., "failed to create event
loop") via flb_utils_error or flb_utils_log before invoking flb_utils_error, or
better add and use a new appropriate error code (for example FLB_ERR_EVENT_LOOP)
in the error enum and map it in flb_utils_error so the reported message
correctly reflects an event loop creation failure.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/flb_lib.c (3)

143-187: Good improvements to error handling; consider resetting config->ch_evl on failure.

The function now properly addresses most critical issues from the previous review:

  • NULL validation for ctx is present.
  • Error paths correctly clean up allocated resources via goto labels.
  • No longer calls flb_stop() or flb_destroy() during initialization.
  • Pointers are set to NULL after cleanup.

However, config->ch_evl is assigned on line 158 but is not reset to NULL in the error paths (lines 183-186). If this function returns an error, config->ch_evl will point to freed memory, which could be problematic if any code (other than the immediate caller's cleanup) attempts to use it.

Consider this adjustment:

 error_1:
     mk_event_loop_destroy(ctx->event_loop);
     ctx->event_loop = NULL;
+    config->ch_evl = NULL;
     return FLB_LIB_ERROR;

163-163: Consider removing or adjusting the perror call.

flb_calloc can return NULL without setting errno (particularly in the OSSFUZZ failure-injection path), so perror("calloc") may print a misleading error message.

You can either remove it or replace it with a direct error message:

     ctx->event_channel = flb_calloc(1, sizeof(struct mk_event));
     if (!ctx->event_channel) {
-        perror("calloc");
+        flb_error("[lib] failed to allocate event_channel");
         goto error_1;
     }

189-218: Good defensive NULL checks; minor consistency improvements possible.

The function now properly checks for NULL before freeing/destroying and sets pointers to NULL afterward, addressing the main concerns from the previous review.

A few optional refinements for consistency and defensive programming:

  1. Line 208 uses free() instead of flb_free(), which is inconsistent with the rest of the codebase.
  2. Lines 205-206 close file descriptors without checking if they're valid or resetting them afterward, which could be improved for robustness.

Optional improvements:

         if (ret != 0) {
             /* make sure to close file descriptors */
-            close(config->ch_notif[0]);
-            close(config->ch_notif[1]);
+            if (config->ch_notif[0] >= 0) {
+                close(config->ch_notif[0]);
+                config->ch_notif[0] = -1;
+            }
+            if (config->ch_notif[1] >= 0) {
+                close(config->ch_notif[1]);
+                config->ch_notif[1] = -1;
+            }
         }
-        free(ctx->event_channel);
+        flb_free(ctx->event_channel);
         ctx->event_channel = NULL;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d12928 and 6aa2773.

📒 Files selected for processing (2)
  • src/flb_lib.c (2 hunks)
  • src/fluent-bit.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/fluent-bit.c
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_lib.c (3)
lib/monkey/mk_core/mk_event.c (4)
  • mk_event_loop_create (61-88)
  • mk_event_channel_create (181-190)
  • mk_event_loop_destroy (91-96)
  • mk_event_channel_destroy (193-202)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
lib/monkey/include/monkey/mk_core/mk_event.h (1)
  • MK_EVENT_ZERO (135-138)
🔇 Additional comments (1)
src/flb_lib.c (1)

264-269: LGTM! Clean integration of the new event loop creation API.

The refactoring successfully delegates event loop setup to flb_create_event_loop() and handles failures appropriately by cleaning up the config and context before returning NULL.

@cosmo0920
Copy link
Contributor

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 18, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cosmo0920
Copy link
Contributor

The patches are really good but we need to add Signed-off line on each of commit.
So, could you follow the instructions on this? https://github.com/fluent/fluent-bit/pull/11171/checks?check_run_id=55564201840

@cosmo0920 cosmo0920 changed the title Recreate the event loop after daemonizing on BSD platforms. lib: Recreate the event loop after daemonizing on BSD platforms. Nov 18, 2025
@yuichiro-naito
Copy link
Author

I rebased my patch to master and squashed 2 commits into 1 commit.
I wrote the category name in the commit message, and also added 'Signed-off-by'.

Because kqueue(2) is not inherited by the child process.

Signed-off-by: Yuichiro NAITO <naito.yuichiro@gmail.com>
@yuichiro-naito
Copy link
Author

I updated the category in the commit message to 'lib:'.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
include/fluent-bit/flb_lib.h (1)

52-53: LGTM! Consider adding API documentation for these lifecycle functions.

The new public APIs are well-designed and correctly solve the BSD kqueue inheritance issue. The function signatures are consistent with the existing API, and the implementations (from the code snippets) show proper error handling and resource cleanup.

However, since these functions expose event loop lifecycle management as part of the public API, brief documentation would help library users understand:

  • When these functions should be called (e.g., around fork on BSD platforms)
  • Return value semantics (0 for success, FLB_LIB_ERROR/-1 for failure)
  • Relationship with flb_create()/flb_destroy() (typically not needed unless forking)

This suggestion is optional since the existing header style generally omits function documentation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 403a863 and 971f916.

📒 Files selected for processing (3)
  • include/fluent-bit/flb_lib.h (1 hunks)
  • src/flb_lib.c (2 hunks)
  • src/fluent-bit.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/flb_lib.c
  • src/fluent-bit.c
🧰 Additional context used
🧬 Code graph analysis (1)
include/fluent-bit/flb_lib.h (1)
src/flb_lib.c (2)
  • flb_create_event_loop (143-187)
  • flb_destroy_event_loop (189-218)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fluent-bit doesn't stop while running in background mode on BSD platforms

2 participants