Skip to content

Add tokio metrics#236

Open
sfackler wants to merge 10 commits intodevelopfrom
tokio-metrics
Open

Add tokio metrics#236
sfackler wants to merge 10 commits intodevelopfrom
tokio-metrics

Conversation

@sfackler
Copy link
Copy Markdown
Member

Before this PR

We didn't have any metrics tracking the state of tasks in the Tokio runtime.

After this PR

==COMMIT_MSG==
Added metrics tracking the state of the Tokio runtime.
==COMMIT_MSG==

Unfortunately, many of these rely on unstable Tokio APIs. As a result, you have to opt-in both with the standard tokio_unstable cfg and a tokio_unstable Cargo feature in this crate.

@changelog-app
Copy link
Copy Markdown

changelog-app bot commented Feb 21, 2025

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Added metrics tracking the state of the Tokio runtime.

Check the box to generate changelog(s)

  • Generate changelog entry

.precision_exact(0)
.min_value(Duration::from_micros(100))
.max_value(Duration::from_secs(10))
.build(),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit unsure about what values are appropriate here. This setup gives us 20 buckets with these ranges:

0ns..65.536µs
65.536µs..131.072µs
131.072µs..262.144µs
262.144µs..524.288µs
524.288µs..1.048576ms
1.048576ms..2.097152ms
2.097152ms..4.194304ms
4.194304ms..8.388608ms
8.388608ms..16.777216ms
16.777216ms..33.554432ms
33.554432ms..67.108864ms
67.108864ms..134.217728ms
134.217728ms..268.435456ms
268.435456ms..536.870912ms
536.870912ms..1.073741824s
1.073741824s..2.147483648s
2.147483648s..4.294967296s
4.294967296s..8.589934592s
8.589934592s..17.179869184s
17.179869184s..18446744073.709551615s

The intent is to avoid spamming the metrics infrastructure with a huge number of buckets, while still giving us enough information to go off of. Future polls under 100us or so are in a totally good place and I don't think we really care about splitting those out, and above a few seconds the poll is so unreasonably long the specific length doesn't matter too much.

Comment thread witchcraft-server/src/metrics/tokio.rs Outdated
@sfackler sfackler marked this pull request as ready for review February 23, 2025 19:45
@sfackler sfackler requested a review from a team February 23, 2025 19:45
//!
//! * `tokio.blocking.threads` (gauge) - The number of threads in Tokio's blocking pool.
//! * `tokio.blocking.threads.idle` (gauge) - The number of threads in Tokio's blocking pool that are idle.
//! * `tokio.tasks.polls` (gauge) - The number of individual poll calls to tasks.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly useful to normalize the values of tokio.tasks.poll-duration-bucket into percentages.

@stale
Copy link
Copy Markdown

stale bot commented Jun 27, 2025

This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days.

@stale stale bot added the stale label Jun 27, 2025
@sfackler sfackler removed the stale label Jun 27, 2025
@stale
Copy link
Copy Markdown

stale bot commented Oct 18, 2025

This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days.

@stale stale bot added the stale label Oct 18, 2025
@sfackler sfackler removed the stale label Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants