fix: relay reconnect loop, asyncio loop safety, and /ping phone_connected by dzp5103 · Pull Request #3 · Primospheres/hermes-androidd

dzp5103 · 2026-04-02T00:23:59Z

Summary

Fixes 4 bugs in the Android bridge relay system.

Bug #1 — RelayClient: MAX_RETRIES=5 hard cap removed

Old code set shouldReconnect=false after 5 failed attempts. User had to manually tap Connect after any network blip.

Fix: Removed retry counter. Loop runs forever while shouldReconnect=true. Backoff caps at 30s.

Bug #4 — RelayClient: race condition on isConnected

Fixed 3s sleep replaced with 200ms polling loop (up to 8s). Exits immediately when onOpen fires.

Bug #2 — android_relay.py: asyncio Future on wrong loop

asyncio.Lock/Event/Future created in calling thread. Fixed: created inside _run_loop() and using asyncio.get_running_loop().create_future().

Bug #3 — /ping missing phone_connected field

Kotlin /ping now returns phone_connected=true so _check_requirements() correctly detects live connection.

…ield Bug #1 + #4 (RelayClient.kt): - Remove MAX_RETRIES=5 hard cap — reconnect loops forever until user taps Disconnect (shouldReconnect=false). Phone network blips no longer require manual reconnect. - Replace blind 3-second post-connect sleep with a polling wait (up to 8s) that exits immediately when onOpen fires. Eliminates the race where isConnected was checked before onOpen had a chance to set it. - Exponential backoff: 1s → 2s → 4s → … capped at 30s. Bug #2 (android_relay.py _handle_http): - Replace state.loop.create_future() (called from another thread context) with asyncio.get_running_loop().create_future() which correctly binds the Future to the event loop the coroutine is already running in. Fixes 'Future attached to a different loop' errors on Python 3.10+. - asyncio.Lock / asyncio.Event objects moved from __init__ (calling thread) into _run_loop() so they are created inside the correct event loop. Bug #3 (RelayClient.kt /ping response): - /ping now returns phone_connected=true so android_tool.py _check_requirements() correctly detects a live connection instead of always falling back to False.

gemini-code-assist

Code Review

This pull request improves the connection reliability of the RelayClient by removing the hard retry cap, implementing a more robust connection timeout mechanism, and exposing a phone_connected status in the /ping endpoint. Additionally, it fixes critical asyncio concurrency issues in the Python relay tool by ensuring locks and futures are correctly bound to the active event loop. I have provided feedback regarding the exponential backoff logic, which currently resets on every failure, and suggested using monotonic time for timeout calculations to avoid issues with system clock adjustments.

gemini-code-assist · 2026-04-02T00:26:47Z

hermes-android-bridge/app/src/main/kotlin/com/hermesandroid/bridge/client/RelayClient.kt

        reconnectJob?.cancel()
        reconnectJob = scope?.launch {
            var backoff = 1000L
-            var retries = 0
-            while (shouldReconnect && !isConnected && retries < MAX_RETRIES) {
-                retries++
-                Log.i(TAG, "Reconnecting in ${backoff}ms... (attempt $retries/$MAX_RETRIES)")
-                notifyStatus(false, "Reconnecting in ${backoff / 1000}s... (attempt $retries/$MAX_RETRIES)")
+            var attempt = 0


The exponential backoff logic is currently reset on every failure. This happens because scheduleReconnect() is called from the onFailure and onClosed callbacks, and it unconditionally cancels the existing reconnectJob before starting a new one. Since backoff and attempt are local variables within the coroutine, they are reset to their initial values (1000L and 0) every time a connection fails, resulting in a constant 1-second retry loop.

To fix this, you should check if the reconnection loop is already active before attempting to start a new one. This allows the existing loop to continue to its next iteration, correctly doubling the backoff and incrementing the attempt counter.

Suggested change

reconnectJob?.cancel()

reconnectJob = scope?.launch {

var backoff = 1000L

var retries = 0

while (shouldReconnect && !isConnected && retries < MAX_RETRIES) {

retries++

Log.i(TAG, "Reconnecting in ${backoff}ms... (attempt $retries/$MAX_RETRIES)")

notifyStatus(false, "Reconnecting in ${backoff / 1000}s... (attempt $retries/$MAX_RETRIES)")

var attempt = 0

if (reconnectJob?.isActive == true) return

reconnectJob = scope?.launch {

var backoff = 1000L

var attempt = 0

gemini-code-assist · 2026-04-02T00:26:47Z

hermes-android-bridge/app/src/main/kotlin/com/hermesandroid/bridge/client/RelayClient.kt

+                val deadline = System.currentTimeMillis() + CONNECT_TIMEOUT_MS
+                while (!isConnected && System.currentTimeMillis() < deadline && shouldReconnect) {
+                    delay(200)
                }


Using System.currentTimeMillis() for calculating deadlines is unreliable because it is not monotonic and can change if the system clock is adjusted (e.g., via NTP or manual settings). For Android applications, it is a best practice to use android.os.SystemClock.elapsedRealtime() for measuring durations and timeouts.

Suggested change

val deadline = System.currentTimeMillis() + CONNECT_TIMEOUT_MS

while (!isConnected && System.currentTimeMillis() < deadline && shouldReconnect) {

delay(200)

}

val deadline = android.os.SystemClock.elapsedRealtime() + CONNECT_TIMEOUT_MS

while (!isConnected && android.os.SystemClock.elapsedRealtime() < deadline && shouldReconnect) {

delay(200)

}

gemini-code-assist bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: relay reconnect loop, asyncio loop safety, and /ping phone_connected#3

fix: relay reconnect loop, asyncio loop safety, and /ping phone_connected#3
dzp5103 wants to merge 1 commit intomainfrom
fix/relay-bugs

dzp5103 commented Apr 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dzp5103 commented Apr 2, 2026

Summary

Bug #1 — RelayClient: MAX_RETRIES=5 hard cap removed

Bug #4 — RelayClient: race condition on isConnected

Bug #2 — android_relay.py: asyncio Future on wrong loop

Bug #3 — /ping missing phone_connected field

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant