Skip to content

Conversation

@hauleth
Copy link
Contributor

@hauleth hauleth commented Feb 26, 2025

This is potential source of a problem, as if there are is error in 2nd or 3rd clause of with then we are leaking connections which in the end may result in the connections exhaustion. Instead we should fail loudly that there is something wrong with the connections. It should also help with debugging potential problems.

@hauleth hauleth requested a review from a team as a code owner February 26, 2025 08:20
@hauleth hauleth force-pushed the do-not-handle-errors-in-postgres-libcluster-strategy branch from 198a011 to b599f4a Compare February 26, 2025 08:21
{:noreply, put_in(state.meta, meta)}
else
reason ->
Logger.error(state.topology, "Failed to connect to Postgres: #{inspect(reason)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this fails loudly does it take down the whole app with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to check exactly, but in theory, if the failure happen repeatedly, then it can cause whole application to go down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it take down the whole app with it?

No, Supervisor will try to restart the Postgres strategy process, and after exhausting the retries, it will just leave it unstarted. But that change doesn’t make much sense anyway, because the node will be out of the cluster in case of a Postgres failure. You probably want to forward already assigned values, send a connect message after a timeout, and handle them more gracefully

@hauleth hauleth force-pushed the do-not-handle-errors-in-postgres-libcluster-strategy branch from b599f4a to c33107a Compare February 27, 2025 19:37
@hauleth hauleth force-pushed the do-not-handle-errors-in-postgres-libcluster-strategy branch 2 times, most recently from 9c386f8 to 4d8320e Compare March 17, 2025 16:00
@abc3
Copy link
Contributor

abc3 commented Mar 25, 2025

leaking connections

In case of failure in the following cases, the node will not handle notifications because there is no retry logic. So yeah, there’s a potential place for a bug, but it doesn’t lead to leaking connections

@hauleth hauleth force-pushed the do-not-handle-errors-in-postgres-libcluster-strategy branch 4 times, most recently from a03fbbb to 1642320 Compare May 16, 2025 19:52
This is potential source of a problem, as if there are is error in 2nd
or 3rd clause of `with` then we are leaking connections which in the end
may result in the connections exhaustion. Instead we should fail loudly
that there is something wrong with the connections. It should also help
with debugging potential problems.
@hauleth hauleth force-pushed the do-not-handle-errors-in-postgres-libcluster-strategy branch from 1642320 to c7b4c06 Compare May 16, 2025 20:26
@abc3
Copy link
Contributor

abc3 commented May 26, 2025

@hauleth fyi supabase/libcluster_postgres#24

I've closed that PR because there's already one in progress that adds auto_reconnect. So, to fix the current issue, just need to add auto_reconnect to the notification connection

@hauleth hauleth closed this Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants