Raise default async discard threshold to ERROR #3880

rschmitt · 2025-08-13T19:21:49Z

There are two basic scenarios where the ring buffer fills up. One is that an application is simply logging too much and the flushing process can't keep up, and in this case a discard threshold of WARN or INFO is probably sufficient to mitigate the problem. However, if flushing has stopped making progress altogether, e.g. due to a full or failed disk, then logging calls will block indefinitely. This can result in a production outage.

This change sets the default discard threshold to ERROR, in order to better mitigate the scenario where the disk fills up, fails, or is in the process of failing. With this threshold, logging should only block at the FATAL level, which would typically mean that the operation is already failing anyway.

There are two basic scenarios where the ring buffer fills up. One is that an application is simply logging too much and the flushing process can't keep up, and in this case a discard threshold of WARN or INFO is probably sufficient to mitigate the problem. However, if flushing has stopped making progress altogether, e.g. due to a full or failed disk, then logging calls will block indefinitely. This can result in a production outage. This change sets the default discard threshold to ERROR, in order to better mitigate the scenario where the disk fills up, fails, or is in the process of failing. With this threshold, logging should only block at the FATAL level, which would typically mean that the operation is already failing anyway.

github-actions · 2025-08-13T19:49:03Z

Job	Requested goals	Build Tool Version	Build Outcome
build-macos-latest	clean install	3.9.8	✅
build-ubuntu-latest	clean install	3.9.8	✅
build-windows-latest	clean install	3.9.8	✅

Generated by gradle/develocity-actions

rschmitt · 2025-08-15T02:06:45Z

More controversially, I think that asyncQueueFullPolicy should default to Discard for basically the same reasons. One weird thing about this is that the blocking queue-full policy is literally named DefaultAsyncQueueFullPolicy and is denoted by the property value Default. =\

vy · 2025-08-15T06:48:57Z

if flushing has stopped making progress altogether, e.g. due to a full or failed disk, then logging calls will block indefinitely. This can result in a production outage.

If logging is a vital component of your application, and it doesn't work, I think it is reasonable to signal that the application is down. Consider this scenario in a cluster, say, Kubernetes, environment: container signals down in liveliness probes due to logging buffer failures, pod will be taken down, and re-spawned in a new environment with sufficient logging capacity. This is what you'd want, instead of losing all logging for an indefinite amount of time. I think this is a good default. If you indeed want the other way around, it makes sense that you need to opt-in for extra configuration, which is log4j2.discardThreshold in this case.

@remkop, @ppkarwasz, WDYT?

More controversially, I think that asyncQueueFullPolicy should default to Discard for basically the same reasons. One weird thing about this is that the blocking queue-full policy is literally named DefaultAsyncQueueFullPolicy and is denoted by the property value Default. =\

This is a valid remark. I'd support a PR

renaming the default from Default to Discard, and
translating Default usages to Discard with a WARN'ing logged

github-project-automation bot added this to Log4j bug tracker Aug 13, 2025

github-project-automation bot moved this to To triage in Log4j bug tracker Aug 13, 2025

rschmitt force-pushed the discard-threshold branch from 624c803 to 052f802 Compare August 13, 2025 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Raise default async discard threshold to ERROR #3880

Raise default async discard threshold to ERROR #3880

rschmitt commented Aug 13, 2025

Uh oh!

github-actions bot commented Aug 13, 2025 •

edited

Loading

Uh oh!

rschmitt commented Aug 15, 2025

Uh oh!

vy commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

Raise default async discard threshold to ERROR #3880

Are you sure you want to change the base?

Raise default async discard threshold to ERROR #3880

Conversation

rschmitt commented Aug 13, 2025

Uh oh!

github-actions bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generated by gradle/develocity-actions

Uh oh!

rschmitt commented Aug 15, 2025

Uh oh!

vy commented Aug 15, 2025

Uh oh!

Uh oh!

github-actions bot commented Aug 13, 2025 •

edited

Loading