Skip to content

Raise default async discard threshold to ERROR #3880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 2.x
Choose a base branch
from

Conversation

rschmitt
Copy link
Contributor

There are two basic scenarios where the ring buffer fills up. One is that an application is simply logging too much and the flushing process can't keep up, and in this case a discard threshold of WARN or INFO is probably sufficient to mitigate the problem. However, if flushing has stopped making progress altogether, e.g. due to a full or failed disk, then logging calls will block indefinitely. This can result in a production outage.

This change sets the default discard threshold to ERROR, in order to better mitigate the scenario where the disk fills up, fails, or is in the process of failing. With this threshold, logging should only block at the FATAL level, which would typically mean that the operation is already failing anyway.

There are two basic scenarios where the ring buffer fills up. One is
that an application is simply logging too much and the flushing process
can't keep up, and in this case a discard threshold of WARN or INFO is
probably sufficient to mitigate the problem. However, if flushing has
stopped making progress altogether, e.g. due to a full or failed disk,
then logging calls will block indefinitely. This can result in a
production outage.

This change sets the default discard threshold to ERROR, in order to
better mitigate the scenario where the disk fills up, fails, or is in
the process of failing. With this threshold, logging should only block
at the FATAL level, which would typically mean that the operation is
already failing anyway.
Copy link

github-actions bot commented Aug 13, 2025

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@rschmitt
Copy link
Contributor Author

More controversially, I think that asyncQueueFullPolicy should default to Discard for basically the same reasons. One weird thing about this is that the blocking queue-full policy is literally named DefaultAsyncQueueFullPolicy and is denoted by the property value Default. =\

@vy
Copy link
Member

vy commented Aug 15, 2025

if flushing has stopped making progress altogether, e.g. due to a full or failed disk, then logging calls will block indefinitely. This can result in a production outage.

If logging is a vital component of your application, and it doesn't work, I think it is reasonable to signal that the application is down. Consider this scenario in a cluster, say, Kubernetes, environment: container signals down in liveliness probes due to logging buffer failures, pod will be taken down, and re-spawned in a new environment with sufficient logging capacity. This is what you'd want, instead of losing all logging for an indefinite amount of time. I think this is a good default. If you indeed want the other way around, it makes sense that you need to opt-in for extra configuration, which is log4j2.discardThreshold in this case.

@remkop, @ppkarwasz, WDYT?

More controversially, I think that asyncQueueFullPolicy should default to Discard for basically the same reasons. One weird thing about this is that the blocking queue-full policy is literally named DefaultAsyncQueueFullPolicy and is denoted by the property value Default. =\

This is a valid remark. I'd support a PR

  1. renaming the default from Default to Discard, and
  2. translating Default usages to Discard with a WARN'ing logged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

2 participants