-
Notifications
You must be signed in to change notification settings - Fork 32
forkchoice grafana visualization #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@g11tech Current design allows the observability API to briefly hold shared locks. Should we add rate limiting to prevent potential write starvation from excessive requests, or is the max slot cap sufficient? |
| while (blk: { | ||
| self.sse_mutex.lock(); | ||
| defer self.sse_mutex.unlock(); | ||
| break :blk self.sse_active != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we have sse_active as an atomic value as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sse_active enforces a cap (MAX_SSE_CONNECTIONS), so we keep it behind a mutex to make the check+increment atomic and simple. Making it atomic would require a CAS loop (or rollback on overflow) to preserve the cap safely.
pkgs/cli/src/api_server.zig
Outdated
| /// Handle finalized checkpoint state endpoint | ||
| /// Serves the finalized checkpoint lean state (BeamState) as SSZ octet-stream at /lean/states/finalized | ||
| fn handleFinalizedCheckpointState(self: *const Self, request: *std.http.Server.Request) !void { | ||
| fn handleFinalizedCheckpointState(self: *ApiServer, request: *std.http.Server.Request) !void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not Self?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks for pointing out!
| .node_registry = registry_1, | ||
| }); | ||
|
|
||
| if (api_server_handle) |handle| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make and later work on an issue to move creation of server post chain creation so we don't have to set it later and can pass chain it during service creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // 5 Rebase forkchouce | ||
| if (pruneForkchoice) | ||
| try self.forkChoice.rebase(latestFinalized.root, &canonical_view); | ||
| try self.forkChoice.rebase(latestFinalized.root, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canonical_view can be stale by the time rebase() runs, so I pass null to force recomputation under rebase’s write lock. Holding the forkchoice lock across processFinalizationAdvancement would avoid recomputation but would block forkchoice updates/reads (onBlock/onAttestation/onInterval/snapshot) while DB/pruning work runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would it be stale? only in syncing I imagine it to be so rapidly changing, also we need to make processFinalizationAdvancement atomic, so probably getCanonicalViewAndAnalysis should also lock (so have getCanonicalViewAndAnalysisAndLock fn I guess) and release the lock in the end of processFinalizationAdvancement (defer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, staleness can still happen outside sync, from my understanding, the rust bridge can run onBlock/onAttestation while this is in flight. If we held the forkchoice write lock across DB/pruning to keep it atomic, we’d block all forkchoice reads/writes (and the lock isn’t re‑entrant). Passing null just makes rebase() recompute under its own write lock, so it uses the current tree at prune time. If we want stronger guarantees, we can add an atomic analyze+rebase API later as part of a bigger concurrency update.
This reverts commit 854ca5d.
Summary
Adds real-time fork choice tree visualization via
/api/forkchoice/graphendpoint with thread-safety fixes to forkchoice module. Implements locked/unlocked pattern for 5 critical functions and eliminates unsafe direct field access.Key Features
Visualization API
/metrics,/health,/events,/api/forkchoice/graph,/lean/states/finalized.Thread Safety
Locking Pattern Implementation
Fixed 5 functions missing thread-safety protection:
getCanonicalView(READ operations - lockShared)getCanonicalityAnalysis(READ operations - lockShared)getCanonicalAncestorAtDepth(READ operations - lockShared)rebase(WRITE operations - exclusive lock)confirmBlock(WRITE operations - exclusive lock)Implemented locked/unlocked pattern:
*Unlockedversions for efficient internal callsEliminated Direct Field Access
self.forkChoice.head→getHead()self.forkChoice.fcStore→getLatestJustified(),getLatestFinalized()self.forkChoice.protoArray.*→getNodeCount(),getBlockSlot()Concurrency Benefits
API Server Reliability
Security & Performance
Screenshot sample
NOTES