Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a health status specification to the Waku API, defining three health states (Unhealthy, MinimallyHealthy, Healthy) and an event source mechanism for health status changes, based on the implementation currently used in js-waku.
Key Changes:
- Addition of
HealthStatusenum type with three health levels - Introduction of event source mechanism for health status change notifications
- Extended documentation describing each health state
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fryorcraken
left a comment
There was a problem hiding this comment.
You do have to specify how the event source is accessible (e.g. events property on WakuNode).
standards/application/waku-api.md
Outdated
|
|
||
| ```yml | ||
| types: | ||
| HealthStatus: |
There was a problem hiding this comment.
One of the things i have initially considered and while thinking again recently on this topic is to consider HealthStatus not just at node level but also at a more fine-grained level.
In case of relay health status makes more sense at shard level because a node can be healthily connected to 1 shard and not on the other.
When it comes to edge/light clients this becomes even more complicated as there are even finer grain concept of content-topics, i.e a node can have peers or subscriptions to certain content-topics and loose subscriptions to others.
If so, does it make more sense to consider health to be notified to a user based on content-topics? Ref discussion thread:
https://discord.com/channels/1110799176264056863/1414975481438011392/1428769536139857989
We can always aggregate the health of all shards or content-topics and depict it as node health, but i do think we will require granular health events to be reported to app devs. Consider status for example, if a user is part of 2 communities (which is nothing but 2 different content-topics) but has at some point only subscriptions to 1 content-topic then we can't indicate node is unhealthy rather reception on 1 community is unhealthy whereas on the other it is fine.
There was a problem hiding this comment.
really good idea, @chaitanyaprem
I think this comes to the realm of Waku App API as now we are communicating about application concerns of fragmented network and it's application of content topics in an app
but we definitely can implement it in Waku API
There was a problem hiding this comment.
I agree that health per content topic may make sense.
But at this point in time, I would KISS it, and then review whether an application developer would want to expose this information to their user. Aka, not do per content topic just yet.
There was a problem hiding this comment.
review whether an application developer would want to expose this information to their user
iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.
There was a problem hiding this comment.
iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.
Please provide more context. What do they do with this information? How does the per content-topic information impacts application behaviour?
There was a problem hiding this comment.
I remember they wanted to show connectivity health at community level , but not sure if priority has changed now. Maybe @igor-sirotin or someone from status team can give more clarity on this
There was a problem hiding this comment.
I remember they wanted to show connectivity health at community level , but not sure if priority has changed now. Maybe @igor-sirotin or someone from status team can give more clarity on this
Good to know, If that's something they have/want to keep then yes, we should do it from the get go.
cc @osmaczko @jrainville to confirm if this feature still exists
cc @plopezlpz @jazzz FYI, there is a request to show connectivity at conversation level (can be done later in chat sdk)
There was a problem hiding this comment.
UI
I remember they wanted to show connectivity health at community level
I only know about this feature from this ticket, but I don't know about our desire to implement this on Status UI level.
It makes sense on paper, but in reality this might unnecessarily complicate UI. And I don't think this is of any priority at the moment.
cc @jrainville
Backend
In backend we manually trigger discovery (or peer exchange) when we don't have enough peers.
- In full client, we run discovery periodically if we "have enough connected peers"
messaging/waku/gowaku.go#L1773-L1776 - In light client, we run peer exchange every 5 seconds
messaging/waku/gowaku.go#L605-L612
Instead, we want to:
- trigger discovery for specific pubsub topic, when it's unhealthy
- trigger peer exchange for specific content topic, when it's unhealhy
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
standards/application/waku-api.md
Outdated
| `Unhealthy` indicates that the node has lost connectivity for message reception, | ||
| sending, or both, and as a result, it cannot reliably process or transmit messages. | ||
|
|
||
| `MinimallyHealthy` indicates that the node meets the minimum operational requirements: connect to at least one peer with a protocol to send messages ([LIGHT-PUSH]() or [RELAY]()) and one peer with a protocol to receive messages ([FILTER ] or [RELAY]) |
There was a problem hiding this comment.
Need to use the proper names IMO (WAKU2-RELAY) and add the links please.
There was a problem hiding this comment.
I am using names as they were defined in originally in #65
do you want to change it across the file?
There was a problem hiding this comment.
🤷 Not sure how much it matters. @jimstir ?
The links are missing tho
There was a problem hiding this comment.
@weboko It is good practice to use the actual title of the spec at the time of writing, e.g. 11/WAKU2-RELAY. In the waku/specs repo this is not required or enforced, but will be in the rfc-index.
|
Almost approved, still need to fix
|
chaitanyaprem
left a comment
There was a problem hiding this comment.
Once comments are addressed, LGTM
standards/application/waku-api.md
Outdated
|
|
||
| ```yml | ||
| types: | ||
| HealthStatus: |
There was a problem hiding this comment.
review whether an application developer would want to expose this information to their user
iirc status app already has this requirement unless there is some priority change. hence was suggesting this.
But i would leave it upto based on priorities.
@fryorcraken I'll merge this PR after #87 in which I define |
| fields: | ||
| eventType: | ||
| type: string | ||
| default: "health" |
There was a problem hiding this comment.
the default values does not make sense here
Ivansete-status
left a comment
There was a problem hiding this comment.
Thanks for it! 🙌
Adding my 2cs that I hope you find useful
Also, cc @fcecin as she will implement it on nwaku side 🥳
standards/application/waku-api.md
Outdated
| HealthStatus: | ||
| type: enum | ||
| values: [Unhealthy, MinimallyHealthy, Healthy] | ||
| description: "Used to identify health of the operating node" |
There was a problem hiding this comment.
| HealthStatus: | |
| type: enum | |
| values: [Unhealthy, MinimallyHealthy, Healthy] | |
| description: "Used to identify health of the operating node" | |
| HealthState: | |
| type: enum | |
| values: [Unhealthy, Healthy] | |
| description: "Used to identify health of the operating node" | |
| HealthStateDetail: | |
| type: object | |
| fields: | |
| state: | |
| type: HealthState | |
| reason: | |
| type: string | |
| description: "Brings clear detail about why the node is in the given state." |
I think is very important/compulsory to allow precise detail about why a node is unhealthy, f.e. For now, it is enough returning this detail in a string so that the API consumer can better understand why the node is not getting healthy. That will also help us bring a better assistance (in case someone asks.)
The reason field will also give us freedom to properly inform about the topic health, as @chaitanyaprem suggested. Even, we can return a JSON in the future with better detail, which can help us to extend the API if needed.
There was a problem hiding this comment.
I think is very important/compulsory to allow precise detail about why a node is unhealth
Why? please explain? the assumption of "unhealthy" is "disconnected". actually, we should probably change the names to "disconnected" "connected" and "partially connected"
standards/application/waku-api.md
Outdated
| `MinimallyHealthy` indicates that the node meets the minimum operational requirements: | ||
| it is connected to at least one peer with a protocol to send messages ([LIGHTPUSH](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/19/lightpush.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | ||
| one peer with a protocol to receive messages ([FILTER](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/12/filter.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | ||
| and one peer with [STORE](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/13/store.md) service capabilities, | ||
| although performance or reliability may still be impacted. |
There was a problem hiding this comment.
I would not use the MinimallyHealthy state. Instead, only Unhealthy and Healthy.
Much simpler and the API consumer doesn't need to bother about lightpush, etc.
| `MinimallyHealthy` indicates that the node meets the minimum operational requirements: | |
| it is connected to at least one peer with a protocol to send messages ([LIGHTPUSH](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/19/lightpush.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | |
| one peer with a protocol to receive messages ([FILTER](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/12/filter.md) or [RELAY](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/11/relay.md)), | |
| and one peer with [STORE](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/13/store.md) service capabilities, | |
| although performance or reliability may still be impacted. |
There was a problem hiding this comment.
What is the behaviour expectation when you are connected but not enough peers?
- Unhealthy/Disconnected but we still may receive messages?
or - Healthy/Connected, but poor reliability?
Due to the p2p nature of this, I don't think you can expect a mid-status of "partially connected".
eMule had something like that where it was orange when you had "some peers" but not enough.
There was a problem hiding this comment.
My 2cs
MinimallyHealthy or PartiallyConnected can mean,
- we are probably able to propagate your message, but we cannot validate it, eg. lack of store service peer discovered.
- We have only one or just few lightpush/filter service peers, so expect failures or big latency in sending/receiving messages.
- Core mode: we have dLow mesh peers. (for the future: maybe not all shards are supported).
With such information UX can be better from App dev point of view as they can suggest to their users on expecting issues/delays.
There was a problem hiding this comment.
- Unhealthy/Disconnected but we still may receive messages?
This cannot happen. If a node receive messages at some point in time that means that it was healthy/connected at that time.
- Healthy/Connected, but poor reliability?
That cannot happen either. If poor reliability, then it should not be considered healthy/connected.
Due to the p2p nature of this, I don't think you can expect a mid-status of "partially connected".
Yes exactly.
eMule had something like that where it was orange when you had "some peers" but not enough.
Good point! Nevertheless, having a certain state should be the trigger for an action to be made by the user, e.g., switch wifi on, open ports in router/firewall, etc. I can't quite see a call to action on orange state.
|
I think we should move away from Health/Unhealthy in favour of
Simply because "health" was referring to the "connection health" and we lose this info in the process. |
|
|
||
| EventSource: | ||
| type: object | ||
| description: "Event source for Waku API events" |
There was a problem hiding this comment.
This description isn't 100% clear. We shouldn't have the whole term that we are describing in the description itself :)
Adding an example might help.
This PR add spec for health status for Waku API and depicts what is used now in js-waku
This is re-use from the original attempt to define Messaging API
https://github.com/waku-org/specs/blob/17464d356a40cfa130c0df25b50769a4f38b7c45/standards/application/messaging-api.md#health-indicator
Resolves - logos-messaging/logos-messaging-js#2712