Conversation
0ec6a44 to
f5cef26
Compare
- PayloadJobCancellation replaces single CancellationToken with
new_fcu/resolved/deadline/any tokens for deterministic cancellation
- Flashblock loop refactored to biased select! state machine
- Resolved gate prevents publishing flashblocks after getPayload
- Tracing spans: build_fallback, build_flashblock (with index/tx_count/
gas_used), execute_pool_txs, seal_block, state_root
- New metrics: payload_job_cancellation_{resolved,new_fcu,deadline,
complete,error}, flashblock_publish_suppressed_total
- CancellationReason enum with typed reason values
4ae8d72 to
628c35b
Compare
|
mostly non-blocking comments, LGTM |
da45de0 to
c13537b
Compare
akundaz
left a comment
There was a problem hiding this comment.
Looks pretty good! Left some comments, but nothing major
| } | ||
|
|
||
| #[derive(Debug)] | ||
| pub(super) struct OpPayloadBuilderInner<Pool, Client, BuilderTx, Tasks> { |
There was a problem hiding this comment.
Why did you encapsulate these into an inner struct? They're meant to be cheap to clone already, now you have Arcs nested inside each other
There was a problem hiding this comment.
Hmm they're cheap but not O-1 cheap: 3 Arc, 2 Sender clones with atomics, Pool/Client/Tasks/BuilderTx clones, BuilderConfig deep clone.
VS one arc clone with single atomic increment in the hot-path of cloning builder to send to blocking tasks (no clone were needed before as everything was sync).
Nested arc only add one arc dereference, negligible vs extra allocs and Clone trait bounds.
ws_pub is only owned by BuilderInner so we can just remove the inner arc that's true.
|
|
||
| // If main token got canceled in here that means we received get_payload and we should drop everything and now update best_payload | ||
| // To ensure that we will return same blocks as rollup-boost (to leverage caches) | ||
| // Block canceled (new FCU, getPayload resolved, or deadline). |
There was a problem hiding this comment.
Since we're doing async now, can we race this function against the cancel token instead of needing to check the token in this function?
There was a problem hiding this comment.
We race it already in phase 2, but the whole build_next_flashblock is run in a blocking task so it is not async and can't yield back. Here we synchronously poll the cancellation token to catches cancellation triggered in between tx execution and publishing to be sure we don't publish an outdated flashblock. This will be greatly improved with continuous building as it separates building from publishing so we can correctly race it in async context.
| cancellation: &super::cancellation::PayloadJobCancellation, | ||
| span: &tracing::Span, | ||
| ) { | ||
| let reason_str = match cancellation.reason() { |
There was a problem hiding this comment.
nit: would be nice to have a Display impl for CancellationReason and to use labels for the payload job cancellation metric instead of defining separate metrics
There was a problem hiding this comment.
I initially tried that but didn't found a way to define a single metric field in OpRBuilderMetrics with custom label.
There was a problem hiding this comment.
It's very awkward you can't put it in the struct. See https://github.com/flashbots/op-rbuilder/pull/388/changes for an example
|
Also, make sure the logs look nice in staging with the new spans. rollup-boost spans are a bit difficult to go through and we should avoid that friction |
Spans are gated behind telemetry flag, so it's more for local testing for now. There's plan to enable them in production when we have proper trace collector which should ensure it don't mess with logs and be useful. |
See (#394, #397, #398)
structured cancellation, select! state machine, tracing spans: