Feat/new prometheus metrics#586
Conversation
…lemetryConfig, CLI flags, and telemetry mod behind prometheus feature
…rometheus instrumentation
|
Hey @dvansari65, I'm a bit confused by this and seeing #552. Is this building off of #552? The idea was that 552 is laying down the groundwork, then subsequent PRs to complete #584 would just be building off that groundwork |
|
@MicaiahReid I accidentally closed #552 instead of updating it. i deleted simnet events for updating promethius and directly updating from svm file in this #586 |
5419677 to
2129ad9
Compare
| } | ||
| Ok(_) => { | ||
| let _ = surfnet_svm.simnet_events_tx.send(SimnetEvent::info(format!( | ||
| "Metrics available at http://{}/metrics", |
There was a problem hiding this comment.
We should display http://localhost:9000/metrics or http://127.0.0.1:9000/metrics, not http://0.0.0.0:9000/metrics
There was a problem hiding this comment.
This comment hasn't been addressed yet
crates/core/src/telemetry.rs
Outdated
| //! | ||
| //! Feature `prometheus` enables a `/metrics` HTTP endpoint | ||
|
|
||
| #[cfg(feature = "prometheus")] |
There was a problem hiding this comment.
Rather than having #[cfg(feature = "prometheus")] a few times in this file, we should just have it once in the core/src/lib.rs file for the telemetry mod:
#[cfg(feature = "prometheus")]
pub mod telemtry;
crates/core/src/surfnet/locker.rs
Outdated
| let remote_account = client.get_account(pubkey, commitment_config).await?; | ||
| // Record how long mainnet fetch took | ||
| #[cfg(feature = "prometheus")] | ||
| crate::telemetry::metrics() |
There was a problem hiding this comment.
There might be other places we call client.get_account and client.get_multiple_accounts - should we instead put these in the remote.rs file in these actual fn definitions?
There was a problem hiding this comment.
yes we can do , this is better
crates/core/src/surfnet/svm.rs
Outdated
|
|
||
| match self.inner.send_transaction(tx.clone()) { | ||
| Ok(res) => Ok(res), | ||
| Ok(res) => { |
There was a problem hiding this comment.
The send_transaction RPC method is already recording transactions. Those will already be routed through this endpoint, so this is going to lead to duplicate transaction tracking. We should be good to remove the metrics calls from this file.
crates/core/src/surfnet/remote.rs
Outdated
| commitment_config: CommitmentConfig, | ||
| ) -> SurfpoolResult<GetAccountResult> { | ||
| #[cfg(feature = "prometheus")] | ||
| let fetch_start = std::time::Instant; |
There was a problem hiding this comment.
This currently breaks the prometheus feature build. fetch_start is being assigned std::time::Instant instead of std::time::Instant::now(), so the later elapsed() calls fail as well. I confirmed this with cargo check -p surfpool-core --features prometheus.
crates/core/src/surfnet/remote.rs
Outdated
| commitment_config: CommitmentConfig, | ||
| ) -> SurfpoolResult<Vec<GetAccountResult>> { | ||
| #[cfg(feature = "prometheus")] | ||
| let fetch_start = std::time::Instant; |
There was a problem hiding this comment.
This currently breaks the prometheus feature build. fetch_start is being assigned std::time::Instant instead of std::time::Instant::now(), so the later elapsed() calls fail as well. I confirmed this with cargo check -p surfpool-core --features prometheus.
crates/core/src/telemetry.rs
Outdated
| result | ||
| } | ||
|
|
||
| pub fn metrics() -> &'static SurfpoolMetrics { |
There was a problem hiding this comment.
metrics() panics unless telemetry init succeeded, but the RPC call sites invoke it unconditionally whenever the binary is compiled with the prometheus feature. That means running with metrics disabled, or continuing after init failure, can panic on the first instrumented request. We should make this a no-op when telemetry is not initialized, or gate all call sites
| } | ||
| Ok(_) => { | ||
| let _ = surfnet_svm.simnet_events_tx.send(SimnetEvent::info(format!( | ||
| "Metrics available at http://{}/metrics", |
There was a problem hiding this comment.
This comment hasn't been addressed yet
…d MeterProvider import, handle uninitialized metrics safely
|
Fixed std::time::Instant missing ::now() typo |

Closes #584
Metrics exposed:
surfpool_transactions_total{status}— transaction success/failure ratesurfpool_transaction_processing_ms— transaction processing latency histogramsurfpool_rpc_requests_total{method}— RPC request rate by methodsurfpool_rpc_latency_ms{method}— RPC request latency histogram by methodsurfpool_remote_fetch_latency_ms— mainnet account cloning latency histogram