Skip to content

Commit 89218a8

Browse files
committed
docs: runner lifecycle (#3510)
1 parent 2a24121 commit 89218a8

File tree

1 file changed

+172
-0
lines changed

1 file changed

+172
-0
lines changed
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Runner Lifecycle
2+
3+
## Connection
4+
5+
```mermaid
6+
sequenceDiagram
7+
participant R as Runner
8+
participant P as Pegboard
9+
participant RW as Runner Workflow
10+
11+
note over R,RW: Phase 1: WebSocket Connection
12+
13+
R->>P: WebSocket open
14+
R->>P: ToServerInit (name, version, totalSlots, lastCommandIdx)
15+
note over R: start ping interval (3s)
16+
note over R: start command ack interval (5min)
17+
18+
P->>RW: Forward (ToServerInit)
19+
note over RW: ProcessInit activity
20+
note over RW: load state, process prepopulated actors
21+
22+
note over R,RW: Phase 2: Initialize Runner
23+
24+
RW->>P: ToClientInit (runnerId, lastEventIdx, metadata)
25+
P->>R: ToClientInit
26+
27+
note over R: store runnerId
28+
note over R: store runnerLostThreshold from metadata
29+
30+
note over R,RW: Phase 3: Resend Pending State
31+
32+
note over R: processUnsentKvRequests
33+
note over R: resendUnacknowledgedEvents
34+
note over R: tunnel.resendBufferedEvents
35+
36+
note over R,RW: Phase 4: Send Missed Commands
37+
38+
RW->>P: ToClientCommands (missed commands)
39+
P->>R: ToClientCommands
40+
note over R: handleCommands
41+
42+
note over R,RW: Phase 5: Complete Connection
43+
44+
note over RW: InsertDb activity
45+
note over RW: write runner to database
46+
note over RW: update allocation indexes
47+
48+
note over R: config.onConnected callback
49+
```
50+
51+
## Reconnect
52+
53+
```mermaid
54+
sequenceDiagram
55+
participant R as Runner
56+
participant P as Pegboard
57+
participant RW as Runner Workflow
58+
59+
note over R,RW: Phase 1: Detect Disconnection
60+
61+
alt WebSocket error/close
62+
P--xR: connection lost
63+
note over R: start runner lost timeout (if threshold configured)
64+
note over R: schedule reconnect with backoff
65+
note over R: config.onDisconnected callback
66+
end
67+
68+
note over R,RW: Phase 2: Reconnect
69+
70+
note over R: calculate backoff delay
71+
note over R: increment reconnectAttempt counter
72+
73+
R->>P: WebSocket open (reconnect)
74+
R->>P: ToServerInit (lastCommandIdx preserved)
75+
76+
note over R: clear reconnect timeout
77+
note over R: clear runner lost timeout
78+
note over R: reset reconnectAttempt = 0
79+
80+
P->>RW: Forward (ToServerInit)
81+
RW->>P: ToClientInit (lastEventIdx)
82+
P->>R: ToClientInit
83+
84+
note over R,RW: Phase 3: Resynchronize
85+
86+
note over R: if runnerId changed, clear event history
87+
88+
note over R: processUnsentKvRequests
89+
note over R: resendUnacknowledgedEvents (from lastEventIdx)
90+
note over R: tunnel.resendBufferedEvents
91+
92+
alt missed commands exist
93+
RW->>P: ToClientCommands (missed commands)
94+
P->>R: ToClientCommands
95+
note over R: handleCommands
96+
end
97+
98+
note over R: config.onConnected callback
99+
```
100+
101+
## Shutdown
102+
103+
```mermaid
104+
sequenceDiagram
105+
participant R as Runner
106+
participant P as Pegboard
107+
participant RW as Runner Workflow
108+
participant A as Actors
109+
110+
note over R,RW: Phase 1: Initiate Shutdown
111+
112+
alt graceful shutdown
113+
R->>P: ToServerStopping
114+
P->>RW: Forward (ToServerStopping)
115+
else forced stop
116+
RW->>RW: receive Stop signal
117+
end
118+
119+
note over R,RW: Phase 2: Drain Runner
120+
121+
note over RW: handle_stopping
122+
note over RW: set state.draining = true
123+
note over RW: ClearDb activity (update_state = Draining)
124+
note over RW: remove from allocation indexes
125+
note over RW: set drain_ts, expired_ts
126+
127+
note over RW: FetchRemainingActors activity
128+
loop for each actor
129+
RW->>A: GoingAway signal
130+
note over A: actor workflows begin stopping
131+
end
132+
133+
note over R,RW: Phase 3: Wait for Actors
134+
135+
note over R: waitForActorsToStop (max 120s)
136+
loop check every 100ms
137+
alt all actors stopped
138+
note over R: continue shutdown
139+
else websocket closed
140+
note over R: force continue shutdown
141+
else timeout reached
142+
note over R: force continue shutdown
143+
end
144+
end
145+
146+
note over R,RW: Phase 4: Close WebSocket
147+
148+
note over R: send ToServerStopping (if not sent)
149+
R->>P: WebSocket close (code=1000, reason=pegboard.runner_shutdown)
150+
note over R: clear ping interval
151+
note over R: clear ack interval
152+
note over R: tunnel.shutdown
153+
154+
note over R: config.onShutdown callback
155+
156+
note over R,RW: Phase 5: Complete Workflow
157+
158+
note over RW: workflow exits drain loop after runner_lost_threshold
159+
160+
note over RW: ClearDb activity (update_state = Stopped)
161+
note over RW: remove from active indexes
162+
note over RW: set stop_ts
163+
164+
note over RW: FetchRemainingActors activity
165+
loop for each remaining actor
166+
RW->>A: Lost signal
167+
note over A: reschedule actors if needed
168+
end
169+
170+
RW->>P: ToClientClose
171+
note over RW: workflow complete
172+
```

0 commit comments

Comments
 (0)