Filed after integration work in `Stackbilt-dev/aegis` (see aegis#432 + aegis#529). Reference impl `stackbilt-web#57` uses `LocalObserveExporter` (direct D1), so this is the first stress-test of the HTTP path under real CF Worker conditions.
Symptom
A Worker configured with:
```ts
createMonitoring({
service: 'aegis-web',
version: '...',
enableTracing: true,
tracingSampling: 0.2, // or 1.0
stackbilt: {
endpoint: 'https://stackbilder.com/api/observe/ingest',
token: env.STACKBILT_OBSERVE_TOKEN,
maxBatchSize: 1,
},
})
```
...emits metrics/spans at normal call sites (`metrics.increment`, wrapped `trace()`), but nothing appears at the ingest endpoint. Manual `curl POST` to the same endpoint with the same token returns HTTP 202 and shows up in `/api/observe/summary` immediately, so the endpoint + token + network path are fine. The library path is where the drop happens.
What was tried (all no-op)
| Attempt |
Result |
| `maxBatchSize: 1` so `StackbiltCloudExporter.maybeFlush` has threshold `1 < 1 = false` → falls through to `flush()` on every emit |
Not visible |
| `tracer.flush()` in `ctx.waitUntil` after each request |
Not visible |
| `metrics.flush()` in `ctx.waitUntil` after each request (MetricsCollector has its own pre-exporter buffer) |
Not visible |
| Both `metrics.flush()` + `tracer.flush()` under `Promise.all` + `ctx.waitUntil` |
Not visible |
| Pre-warm `createMonitoring` cache with raw Env before MCP `apiHandler` (Hono middleware is bypassed by OAuthProvider) |
Not visible |
| `tracingSampling: 1.0` to rule out sample-drop |
Not visible |
| Synchronous `await fetch(endpoint, ...)` as a direct probe from inside a request handler, bypassing the library |
Also not visible \u2014 `wrangler tail` never showed `console.log` output for this Worker at all, even though it was clearly serving requests |
Likely suspects
Without tail visibility the exact layer is opaque, but the ordered chain through the library is:
- `metrics.increment()` \u2192 `record()` pushes to `MetricsCollector.buffer`
- `record()` only flushes when `this.options.batchSize && buffer.length >= batchSize` \u2014 but `createMonitoring` wires `MetricsCollector` with `flushInterval: 30000` (setInterval) and no explicit `batchSize`. On CF Workers the setInterval fires but the flushInterval-driven flush is isolate-bound and may never actually run between short request lifetimes.
- Explicit `metrics.flush()` calls `await this.options.export.export(metrics)` \u2014 which is `StackbiltCloudExporter.export(items, kind)` \u2014 which pushes to `this.metrics[]` and calls `maybeFlush()`
- `maybeFlush()` with `maxBatchSize: 1` \u2014 the condition `totalItems < this.maxBatchSize` is false for any `totalItems >= 1`, so should fall through to `flush()`
- `flush()` \u2192 `doFlush()` \u2192 `fetch(endpoint, ...)`
The break is somewhere in steps 3\u20135. Either:
- `MetricsCollector.flush()` finds `buffer.length === 0` on the explicit call because `record()` wrote straight to the internal buffer but CF Worker memory / isolate state isn't what we expect
- Or `exporter.maybeFlush()` has a subtle condition that prevents the HTTP POST
Workaround (shipped in aegis)
Bypass `createMonitoring`'s `stackbilt` wiring entirely. Construct `Logger` + `MetricsCollector` + `Tracer` manually. Pass a custom exporter that implements all three of `MetricsExporter`, `SpanExporter`, `LogOutput` with zero internal buffering \u2014 every `export()` is an immediate `fetch()` POST with `console.error` on non-OK response.
```ts
class DirectCloudExporter implements MetricsExporter, SpanExporter, LogOutput {
async export(items: MetricPoint[] | TraceSpan[]) {
const isSpans = items.length && 'traceId' in items[0];
await this.post(isSpans ? { service, spans: items } : { service, metrics: items });
}
async write(entry: LogEntry) { await this.post({ service: this.service, logs: [entry] }); }
private async post(payload) {
const res = await fetch(this.endpoint, { method: 'POST', headers: { authorization: `Bearer ${this.token}`, 'content-type': 'application/json' }, body: JSON.stringify(payload) });
if (!res.ok) console.error(`[exporter] POST failed status=${res.status}`);
}
}
```
Full implementation at `Stackbilt-dev/aegis` `web/src/lib/direct-cloud-exporter.ts` + `web/src/monitoring.ts` (landed in aegis#530, 2026-04-18).
Verification the workaround works
With 20% sampling, MCP call volume produces proportional trace counts at stackbilder.com/observe within seconds. Pre-workaround: `aegis-web` appeared with 1 trace (the manual curl from debug). Post-workaround: traces grow in real-time as requests fire (2 \u2192 8 after 25 MCP calls = 6 sampled spans, variance-within-bounds for 20%).
Ask
If you can repro: the fix is probably in `StackbiltCloudExporter.maybeFlush()` or the `MetricsCollector` \u2194 exporter handoff under CF Worker isolate timing. The `DirectCloudExporter` shape could also be a candidate for inclusion in the library as an alternative to `StackbiltCloudExporter` \u2014 especially useful for low-QPS Workers where batching isn't a cost concern.
References
- aegis#432 (parent, dogfood instrumentation)
- aegis#529 (debug thread)
- aegis#530 (workaround PR)
- `stackbilt-web#57` (reference impl using `LocalObserveExporter`)
Filed after integration work in `Stackbilt-dev/aegis` (see aegis#432 + aegis#529). Reference impl `stackbilt-web#57` uses `LocalObserveExporter` (direct D1), so this is the first stress-test of the HTTP path under real CF Worker conditions.
Symptom
A Worker configured with:
```ts
createMonitoring({
service: 'aegis-web',
version: '...',
enableTracing: true,
tracingSampling: 0.2, // or 1.0
stackbilt: {
endpoint: 'https://stackbilder.com/api/observe/ingest',
token: env.STACKBILT_OBSERVE_TOKEN,
maxBatchSize: 1,
},
})
```
...emits metrics/spans at normal call sites (`metrics.increment`, wrapped `trace()`), but nothing appears at the ingest endpoint. Manual `curl POST` to the same endpoint with the same token returns HTTP 202 and shows up in `/api/observe/summary` immediately, so the endpoint + token + network path are fine. The library path is where the drop happens.
What was tried (all no-op)
Likely suspects
Without tail visibility the exact layer is opaque, but the ordered chain through the library is:
The break is somewhere in steps 3\u20135. Either:
Workaround (shipped in aegis)
Bypass `createMonitoring`'s `stackbilt` wiring entirely. Construct `Logger` + `MetricsCollector` + `Tracer` manually. Pass a custom exporter that implements all three of `MetricsExporter`, `SpanExporter`, `LogOutput` with zero internal buffering \u2014 every `export()` is an immediate `fetch()` POST with `console.error` on non-OK response.
```ts
class DirectCloudExporter implements MetricsExporter, SpanExporter, LogOutput {
async export(items: MetricPoint[] | TraceSpan[]) {
const isSpans = items.length && 'traceId' in items[0];
await this.post(isSpans ? { service, spans: items } : { service, metrics: items });
}
async write(entry: LogEntry) { await this.post({ service: this.service, logs: [entry] }); }
private async post(payload) {
const res = await fetch(this.endpoint, { method: 'POST', headers: { authorization: `Bearer ${this.token}`, 'content-type': 'application/json' }, body: JSON.stringify(payload) });
if (!res.ok) console.error(`[exporter] POST failed status=${res.status}`);
}
}
```
Full implementation at `Stackbilt-dev/aegis` `web/src/lib/direct-cloud-exporter.ts` + `web/src/monitoring.ts` (landed in aegis#530, 2026-04-18).
Verification the workaround works
With 20% sampling, MCP call volume produces proportional trace counts at stackbilder.com/observe within seconds. Pre-workaround: `aegis-web` appeared with 1 trace (the manual curl from debug). Post-workaround: traces grow in real-time as requests fire (2 \u2192 8 after 25 MCP calls = 6 sampled spans, variance-within-bounds for 20%).
Ask
If you can repro: the fix is probably in `StackbiltCloudExporter.maybeFlush()` or the `MetricsCollector` \u2194 exporter handoff under CF Worker isolate timing. The `DirectCloudExporter` shape could also be a candidate for inclusion in the library as an alternative to `StackbiltCloudExporter` \u2014 especially useful for low-QPS Workers where batching isn't a cost concern.
References