Skip to content

feat: Add configurable stale_timeout for APRS-IS connections#223

Merged
hemna merged 3 commits intomasterfrom
feature/configurable-stale-timeout
Mar 27, 2026
Merged

feat: Add configurable stale_timeout for APRS-IS connections#223
hemna merged 3 commits intomasterfrom
feature/configurable-stale-timeout

Conversation

@hemna
Copy link
Copy Markdown
Collaborator

@hemna hemna commented Mar 27, 2026

Summary

Add configurable stale_timeout for APRS-IS connections to allow faster detection of dead connections.

Problem

The stale connection threshold is hardcoded to 2 minutes in APRSISDriver.__init__(). In production environments using the listen command with MQTT plugin, I observed connection stalls occurring every 6-7 minutes. The 2-minute detection delay results in significant data loss (~9 reconnects/hour with 2+ minutes of lost data each time).

The issue appears to be silent TCP connection failures to APRS-IS servers where the socket stays "connected" but no data flows. The TCP keepalive settings don't help because the connection isn't truly dead - it's just that data stops flowing (possibly due to APRS2.net load balancer behavior, NAT timeouts, or network issues).

Solution

Add a stale_timeout configuration option that defaults to 120 seconds for backward compatibility but allows users to reduce it for faster recovery.

Changes:

  • aprsd/conf/client.py: Add stale_timeout config option (default: 120 seconds)
  • aprsd/client/drivers/aprsis.py: Update APRSISDriver.__init__ to use the config value with backward compatibility fallback
  • tests/client/drivers/test_aprsis_driver.py: Update tests to handle the new configuration

Usage

[aprs_network]
stale_timeout = 60  # Reconnect after 60 seconds without data

Backward Compatibility

The default remains 120 seconds (2 minutes), and if the option is not present in the config, the code falls back to the same default. This ensures existing configurations continue to work without changes.

hemna added 3 commits March 27, 2026 10:05
Add a new 'stale_timeout' configuration option to the aprs_network config
group that allows users to customize how long to wait before considering
an APRS-IS connection stale.

Problem:
The stale connection threshold was hardcoded to 2 minutes. In environments
with frequent network hiccups or when using certain APRS-IS servers that
may drop connections silently, 2 minutes can be too long to wait before
reconnecting, resulting in significant data loss.

Solution:
- Add 'stale_timeout' option to aprsd/conf/client.py with default of 120s
- Update APRSISDriver.__init__ to use the config value
- Maintain backward compatibility by defaulting to 120s if not configured
- Update tests to handle the new configuration option

Usage:
  [aprs_network]
  stale_timeout = 60  # Reconnect after 60 seconds without data

The default remains 120 seconds (2 minutes) for backward compatibility.
The APRSISDriver uses @singleton decorator which transforms the class
into a function. The test was incorrectly trying to use __new__ which
doesn't work with decorated singletons. Instead, re-initialize the
existing instance after changing the config.
The singleton's max_delta was being modified by test_init_custom_stale_timeout
and not restored, causing test_is_stale_connection_false to fail because
it expected 2 minutes but got 60 seconds.
@hemna hemna merged commit 27413ab into master Mar 27, 2026
10 of 12 checks passed
@hemna hemna deleted the feature/configurable-stale-timeout branch March 27, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant