StackbiltCloudExporter: silent no-flush under low-QPS CF Worker isolates

Filed after integration work in \`Stackbilt-dev/aegis\` (see aegis#432 + aegis#529). Reference impl \`stackbilt-web#57\` uses \`LocalObserveExporter\` (direct D1), so this is the first stress-test of the HTTP path under real CF Worker conditions.

## Symptom

A Worker configured with:

\`\`\`ts
createMonitoring({
  service: 'aegis-web',
  version: '...',
  enableTracing: true,
  tracingSampling: 0.2,  // or 1.0
  stackbilt: {
    endpoint: 'https://stackbilder.com/api/observe/ingest',
    token: env.STACKBILT_OBSERVE_TOKEN,
    maxBatchSize: 1,
  },
})
\`\`\`

...emits metrics/spans at normal call sites (\`metrics.increment\`, wrapped \`trace()\`), but **nothing appears at the ingest endpoint**. Manual \`curl POST\` to the same endpoint with the same token returns HTTP 202 and shows up in \`/api/observe/summary\` immediately, so the endpoint + token + network path are fine. The library path is where the drop happens.

## What was tried (all no-op)

| Attempt | Result |
|---------|--------|
| \`maxBatchSize: 1\` so \`StackbiltCloudExporter.maybeFlush\` has threshold \`1 < 1 = false\` → falls through to \`flush()\` on every emit | Not visible |
| \`tracer.flush()\` in \`ctx.waitUntil\` after each request | Not visible |
| \`metrics.flush()\` in \`ctx.waitUntil\` after each request (MetricsCollector has its own pre-exporter buffer) | Not visible |
| Both \`metrics.flush()\` + \`tracer.flush()\` under \`Promise.all\` + \`ctx.waitUntil\` | Not visible |
| Pre-warm \`createMonitoring\` cache with raw Env before MCP \`apiHandler\` (Hono middleware is bypassed by OAuthProvider) | Not visible |
| \`tracingSampling: 1.0\` to rule out sample-drop | Not visible |
| Synchronous \`await fetch(endpoint, ...)\` as a direct probe from inside a request handler, bypassing the library | Also not visible \u2014 \`wrangler tail\` never showed \`console.log\` output for this Worker at all, even though it was clearly serving requests |

## Likely suspects

Without tail visibility the exact layer is opaque, but the ordered chain through the library is:

1. \`metrics.increment()\` \u2192 \`record()\` pushes to \`MetricsCollector.buffer\`
2. \`record()\` only flushes when \`this.options.batchSize && buffer.length >= batchSize\` \u2014 but \`createMonitoring\` wires \`MetricsCollector\` with \`flushInterval: 30000\` (setInterval) and **no explicit \`batchSize\`**. On CF Workers the setInterval fires but the flushInterval-driven flush is isolate-bound and may never actually run between short request lifetimes.
3. Explicit \`metrics.flush()\` calls \`await this.options.export.export(metrics)\` \u2014 which is \`StackbiltCloudExporter.export(items, kind)\` \u2014 which pushes to \`this.metrics[]\` and calls \`maybeFlush()\`
4. \`maybeFlush()\` with \`maxBatchSize: 1\` \u2014 the condition \`totalItems < this.maxBatchSize\` is false for any \`totalItems >= 1\`, so should fall through to \`flush()\`
5. \`flush()\` \u2192 \`doFlush()\` \u2192 \`fetch(endpoint, ...)\`

The break is somewhere in steps 3\u20135. Either:
- \`MetricsCollector.flush()\` finds \`buffer.length === 0\` on the explicit call because \`record()\` wrote straight to the internal buffer but CF Worker memory / isolate state isn't what we expect
- Or \`exporter.maybeFlush()\` has a subtle condition that prevents the HTTP POST

## Workaround (shipped in aegis)

Bypass \`createMonitoring\`'s \`stackbilt\` wiring entirely. Construct \`Logger\` + \`MetricsCollector\` + \`Tracer\` manually. Pass a custom exporter that implements all three of \`MetricsExporter\`, \`SpanExporter\`, \`LogOutput\` with **zero internal buffering** \u2014 every \`export()\` is an immediate \`fetch()\` POST with \`console.error\` on non-OK response.

\`\`\`ts
class DirectCloudExporter implements MetricsExporter, SpanExporter, LogOutput {
  async export(items: MetricPoint[] | TraceSpan[]) {
    const isSpans = items.length && 'traceId' in items[0];
    await this.post(isSpans ? { service, spans: items } : { service, metrics: items });
  }
  async write(entry: LogEntry) { await this.post({ service: this.service, logs: [entry] }); }
  private async post(payload) {
    const res = await fetch(this.endpoint, { method: 'POST', headers: { authorization: \`Bearer \${this.token}\`, 'content-type': 'application/json' }, body: JSON.stringify(payload) });
    if (!res.ok) console.error(\`[exporter] POST failed status=\${res.status}\`);
  }
}
\`\`\`

Full implementation at \`Stackbilt-dev/aegis\` \`web/src/lib/direct-cloud-exporter.ts\` + \`web/src/monitoring.ts\` (landed in aegis#530, 2026-04-18).

## Verification the workaround works

With 20% sampling, MCP call volume produces proportional trace counts at stackbilder.com/observe within seconds. Pre-workaround: \`aegis-web\` appeared with 1 trace (the manual curl from debug). Post-workaround: traces grow in real-time as requests fire (2 \u2192 8 after 25 MCP calls = 6 sampled spans, variance-within-bounds for 20%).

## Ask

If you can repro: the fix is probably in \`StackbiltCloudExporter.maybeFlush()\` or the \`MetricsCollector\` \u2194 exporter handoff under CF Worker isolate timing. The \`DirectCloudExporter\` shape could also be a candidate for inclusion in the library as an alternative to \`StackbiltCloudExporter\` \u2014 especially useful for low-QPS Workers where batching isn't a cost concern.

## References

- aegis#432 (parent, dogfood instrumentation)
- aegis#529 (debug thread)
- aegis#530 (workaround PR)
- \`stackbilt-web#57\` (reference impl using \`LocalObserveExporter\`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StackbiltCloudExporter: silent no-flush under low-QPS CF Worker isolates #10

Symptom

What was tried (all no-op)

Likely suspects

Workaround (shipped in aegis)

Verification the workaround works

Ask

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attempt	Result
`maxBatchSize: 1` so `StackbiltCloudExporter.maybeFlush` has threshold `1 < 1 = false` → falls through to `flush()` on every emit	Not visible
`tracer.flush()` in `ctx.waitUntil` after each request	Not visible
`metrics.flush()` in `ctx.waitUntil` after each request (MetricsCollector has its own pre-exporter buffer)	Not visible
Both `metrics.flush()` + `tracer.flush()` under `Promise.all` + `ctx.waitUntil`	Not visible
Pre-warm `createMonitoring` cache with raw Env before MCP `apiHandler` (Hono middleware is bypassed by OAuthProvider)	Not visible
`tracingSampling: 1.0` to rule out sample-drop	Not visible
Synchronous `await fetch(endpoint, ...)` as a direct probe from inside a request handler, bypassing the library	Also not visible \u2014 `wrangler tail` never showed `console.log` output for this Worker at all, even though it was clearly serving requests

StackbiltCloudExporter: silent no-flush under low-QPS CF Worker isolates #10

Description

Symptom

What was tried (all no-op)

Likely suspects

Workaround (shipped in aegis)

Verification the workaround works

Ask

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions