Skip to content

Cherry-pick Prometheus per-query metrics to release#1135

Draft
justinfrevert wants to merge 1 commit intorelease/node-0.22.4-bfrom
cherry-pick-metrics-to-0.22
Draft

Cherry-pick Prometheus per-query metrics to release#1135
justinfrevert wants to merge 1 commit intorelease/node-0.22.4-bfrom
cherry-pick-metrics-to-0.22

Conversation

@justinfrevert
Copy link
Copy Markdown
Contributor

…22100] (#904)

  • feat: add Prometheus per-query metrics to midnight data sources

Introduce MidnightDataSourceMetrics with public accessor methods, replacing the upstream McFollowerMetrics whose accessors are crate-private in partner-chains v1.8.1. The local observed_async_trait! macro now records call counts and timing histograms for all six midnight-specific data source methods (candidates, cnight observation, federated authority observation).

JIRA: PM-22100
Made-with: Cursor

  • ci: normalise scan job name (ci: normalise scan job name #823)

  • update scanner action to latest version

  • update scanner action to latest version

  • ci: rename workflow name from build to scan


  • chore: add change file for PM-22100 dbsync query metrics

Made-with: Cursor

  • feat: add sub-query SQL-level Prometheus timing for midnight data sources

Add individual Prometheus timing histograms for each SQL query inside midnight-node data source methods. This provides per-query latency visibility alongside the existing method-level timing, enabling precise identification of slow DBSync queries on mainnet.

13 sub-query timers added across 3 data sources:

  • cNight observation: 5 queries (block lookup + 4 concurrent UTXO queries)
  • Federated authority: 3 queries (block lookup + 2 governance UTXOs)
  • Candidates: 5 queries across 3 methods

Ref: PM-22100
Made-with: Cursor

  • refactor: extract sub-query timer helper to reduce repetition

Replace 13 inline timer patterns with a shared start_sub_query_timer() helper and SubQueryTimer RAII guard in the metrics module. Each call site reduces from 3 lines to 1.

Made-with: Cursor

  • refactor: remove method-level timing, keep SQL-level sub-query timers only

Remove the observed_async_trait! macro timing (method-level) since per-SQL-query timing provides more useful granularity. Simplify MidnightDataSourceMetrics to histogram-only (remove unused call counter). Rename metric to midnight_data_source_query_time_elapsed with query_name label for clarity.

Made-with: Cursor

  • style: apply rustfmt to candidates data source

Made-with: Cursor

  • chore: update changes file to reflect SQL-level sub-query timing

Made-with: Cursor


Overview

🗹 TODO before merging

  • Ready

📌 Submission Checklist

  • Changes are backward-compatible (or flagged if breaking)
  • Pull request description explains why the change is needed
  • Self-reviewed the diff
  • I have included a change file, or skipped for this reason:
  • If the changes introduce a new feature, I have bumped the node minor version
  • Update documentation (if relevant)
  • Updated AGENTS.md if build commands, architecture, or workflows changed
  • No new todos introduced

🧪 Testing Evidence

Please describe any additional testing aside from CI:

  • Additional tests are provided (if possible)

🔱 Fork Strategy

  • Node Runtime Update
  • Node Client Update
  • Other:
  • N/A

Links

…22100] (#904)

* feat: add Prometheus per-query metrics to midnight data sources

Introduce MidnightDataSourceMetrics with public accessor methods,
replacing the upstream McFollowerMetrics whose accessors are
crate-private in partner-chains v1.8.1. The local observed_async_trait!
macro now records call counts and timing histograms for all six
midnight-specific data source methods (candidates, cnight observation,
federated authority observation).

JIRA: PM-22100
Made-with: Cursor

* ci: normalise scan job name (#823)

* update scanner action to latest version

Signed-off-by: Giles Cope <gilescope@gmail.com>

* update scanner action to latest version

Signed-off-by: Giles Cope <gilescope@gmail.com>

* ci: rename workflow name from build to scan

Signed-off-by: Giles Cope <gilescope@gmail.com>

---------

Signed-off-by: Giles Cope <gilescope@gmail.com>

* chore: add change file for PM-22100 dbsync query metrics

Made-with: Cursor

* feat: add sub-query SQL-level Prometheus timing for midnight data sources

Add individual Prometheus timing histograms for each SQL query inside
midnight-node data source methods. This provides per-query latency
visibility alongside the existing method-level timing, enabling
precise identification of slow DBSync queries on mainnet.

13 sub-query timers added across 3 data sources:
- cNight observation: 5 queries (block lookup + 4 concurrent UTXO queries)
- Federated authority: 3 queries (block lookup + 2 governance UTXOs)
- Candidates: 5 queries across 3 methods

Ref: PM-22100
Made-with: Cursor

* refactor: extract sub-query timer helper to reduce repetition

Replace 13 inline timer patterns with a shared start_sub_query_timer()
helper and SubQueryTimer RAII guard in the metrics module. Each call
site reduces from 3 lines to 1.

Made-with: Cursor

* refactor: remove method-level timing, keep SQL-level sub-query timers only

Remove the observed_async_trait! macro timing (method-level) since
per-SQL-query timing provides more useful granularity. Simplify
MidnightDataSourceMetrics to histogram-only (remove unused call counter).
Rename metric to midnight_data_source_query_time_elapsed with query_name
label for clarity.

Made-with: Cursor

* style: apply rustfmt to candidates data source

Made-with: Cursor

* chore: update changes file to reflect SQL-level sub-query timing

Made-with: Cursor

---------

Signed-off-by: Giles Cope <gilescope@gmail.com>
Co-authored-by: Squirrel <giles.cope@shielded.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants