Skip to content

feat(zql): countdown-based sampling for MeasurePushOperator#5603

Draft
Karavil wants to merge 4 commits intorocicorp:mainfrom
Karavil:goblins/tdigest-sampling
Draft

feat(zql): countdown-based sampling for MeasurePushOperator#5603
Karavil wants to merge 4 commits intorocicorp:mainfrom
Karavil:goblins/tdigest-sampling

Conversation

@Karavil
Copy link
Contributor

@Karavil Karavil commented Feb 25, 2026

Problem

MeasurePushOperator.push() calls performance.now() and feeds every result into a TDigest on every single push. During large pokes (41K+ diffs from initial sync or batch mutations), this per-push measurement path adds measurable overhead. At 40K pushes, the measurement logic alone accounts for roughly 3x throughput reduction compared to skipping it entirely.

Solution

Countdown-based sampling: measure every Nth push instead of every push. resolveSampleEvery() computes the interval once at construction from a metricsSampleRate option, converting the rate into an integer countdown. The hot path is a single integer decrement and compare. No per-push Math.random(), no per-push branching beyond the countdown check.

Two new optional properties on the metricsDelegate object:

  • disableMetrics: true: skip all measurement. The push method forwards directly to the output with zero instrumentation overhead.
  • metricsSampleRate: number: fraction of pushes to measure. 1 = every push (current default, no behavior change). 0.01 = every 100th push. 0 = disabled. Clamped to [0, 1], converted to interval via Math.round(1 / rate).

Benchmark results

Measured with vitest bench. Each row is the mean time per iteration (lower is better). "Overhead saved" compares sampleRate=0.01 against the sampleRate=1 baseline.

2,000 pushes

Variant hz (ops/s) Mean (ms) vs baseline
sampleRate=1 (default) 7,809 0.128 baseline
sampleRate=0.01 17,958 0.056 2.30x faster
disableMetrics: true 18,766 0.053 2.40x faster
sampleRate=0 19,134 0.052 2.45x faster

15,000 pushes

Variant hz (ops/s) Mean (ms) vs baseline
sampleRate=1 (default) 865 1.156 baseline
sampleRate=0.01 2,287 0.437 2.64x faster
disableMetrics: true 2,542 0.393 2.94x faster
sampleRate=0 2,605 0.384 3.01x faster

40,000 pushes

Variant hz (ops/s) Mean (ms) vs baseline
sampleRate=1 (default) 324 3.082 baseline
sampleRate=0.01 847 1.181 2.61x faster
disableMetrics: true 962 1.039 2.97x faster
sampleRate=0 963 1.039 2.97x faster

The overhead ratio is consistent across scales: sampleRate=0.01 recovers roughly 60% of the measurement cost while still collecting statistically useful digest data. At 40K pushes, that is ~1.9ms saved per poke.

Backward compatibility

Default sampleRate is 1, which means every push is measured. Existing consumers that do not pass disableMetrics or metricsSampleRate see identical behavior. The MetricsDelegate interface is unchanged. The new properties are read from the delegate object at construction time via duck typing, so no type changes propagate to callers.

Test plan

  • Existing MeasurePushOperator tests pass unchanged (no regressions in default behavior)
  • New sampling tests cover: default (sampleRate=1 measures every push), disableMetrics fast path, sampleRate=0 disabled path, fractional rates (0.5, 0.25), out-of-range clamping (values >1 and <0), non-object delegate fallback
  • Multi-scale benchmark at packages/zql/src/query/measure-push-operator.bench.ts validates overhead reduction at 2K, 15K, and 40K push counts

@vercel
Copy link

vercel bot commented Feb 25, 2026

Someone is attempting to deploy a commit to the Rocicorp Team on Vercel.

A member of the Team first needs to authorize it.

Adds configurable sampling to MeasurePushOperator to reduce per-push
overhead from performance.now() + tdigest stats computation.

Sampling is controlled via metricsDelegate options:
- disableMetrics: true skips all measurement
- metricsSampleRate: 0-1 controls sampling frequency

Default behavior is unchanged (sample rate = 1, measure every push).

* Add resolveSampleEvery() to convert delegate options to sample interval
* Add #sampleEvery/#sampleCountdown fields for countdown-based sampling
* Skip performance.now() + addMetric() calls for non-sampled pushes
@Karavil Karavil force-pushed the goblins/tdigest-sampling branch from 26f2390 to 7eafa8c Compare February 25, 2026 01:42
@Karavil Karavil changed the title feat(zql): add metrics sampling to MeasurePushOperator feat(zql): add configurable metrics sampling to MeasurePushOperator Feb 25, 2026
Alp added 2 commits February 24, 2026 20:51
Covers: default sampleRate=1 (every push), disableMetrics=true,
sampleRate=0, fractional rates (0.5→every 2nd, 0.25→every 4th),
clamping for out-of-range values (<0, >1).
41k pushes across four configurations: sampleRate=1 (default),
disableMetrics=true, sampleRate=0.01, and sampleRate=0 (disabled).
@Karavil Karavil changed the title feat(zql): add configurable metrics sampling to MeasurePushOperator feat(zql): add countdown-based sampling to MeasurePushOperator Feb 25, 2026
Add benchmark cases at 2K, 15K, and 40K push counts to show
how sampling overhead scales with poke size.

* Replace single 41K test with parameterized loop over [2K, 15K, 40K]
* Each scale tests all four variants: sampleRate=1, sampleRate=0.01,
  disableMetrics=true, sampleRate=0
@Karavil Karavil changed the title feat(zql): add countdown-based sampling to MeasurePushOperator feat(zql): countdown-based sampling for MeasurePushOperator Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant