[codex] Trajectory store schema + retention by furukama · Pull Request #643 · HybridAIOne/hybridclaw

furukama · 2026-04-28T18:36:32Z

Summary

Implements #488 by making captured skill-run trajectories schema version 2 records with first-class fields for input, output, model, tools used, outcome, and score.

What changed

Adds top-level trajectory record fields for input, output, model, tools_used, outcome, and score while preserving the raw skill_run event payload for compatibility.
Adds trajectory retention config under adaptiveSkills.trajectoryCapture with a 365-day default and per-tenant overrides via retentionDaysByTenant.
Runs trajectory JSONL pruning from the existing Adaptive Skills periodic maintenance path and removes empty date partitions after pruning.
Documents the retention policy and schema-related config in Adaptive Skills and reference docs.
Adds coverage for trajectory schema capture, retention compaction behavior, and runtime config normalization.

Validation

biome check --write src
biome check src
tsc --noEmit --noUnusedLocals --noUnusedParameters with Node 22
vitest run --configLoader runner --config vitest.unit.config.ts tests/skills-observation.test.ts tests/skills-inspection.test.ts with Node 22
vitest run --configLoader runner --config vitest.integration.config.ts tests/config-reload.integration.test.ts with Node 22

Closes #488.

furukama · 2026-04-28T18:41:21Z

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

claude · 2026-04-28T18:41:39Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Copilot

Pull request overview

Implements trajectory schema v2 and configurable retention/pruning for captured skill-run trajectories, including docs and test coverage updates for the new behavior.

Changes:

Upgrades trajectory JSONL records to schema v2 with first-class fields (input/output/model/tools/outcome/score) while retaining the raw skill_run event payload.
Adds trajectory retention configuration (global + per-tenant overrides) and normalizes it during runtime config reload.
Integrates trajectory pruning into the periodic Adaptive Skills inspection/maintenance path and documents the new config.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/skills-observation.test.ts	Updates expectations to assert schema v2 trajectory fields are captured.
tests/skills-inspection.test.ts	Adds coverage for tenant-specific trajectory retention pruning and empty-partition cleanup.
tests/config-reload.integration.test.ts	Adds coverage for retention config normalization on reload.
src/skills/skills-inspection.ts	Runs scheduled trajectory pruning during periodic inspection.
src/skills/skill-run-trajectories.ts	Implements schema v2 record building (incl. score/tools) and JSONL retention pruning by tenant.
src/skills/adaptive-skills-types.ts	Extends Adaptive Skills config types with retention fields.
src/config/runtime-config.ts	Adds defaults + normalization for trajectory retention settings.
docs/development/reference/configuration.md	Documents trajectory retention configuration.
docs/development/extensibility/adaptive-skills.md	Documents retention policy and trajectory capture config fields.
docs/content/reference/configuration.md	Mirrors retention config documentation for published docs.
docs/content/extensibility/adaptive-skills.md	Mirrors retention policy documentation for published docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

furukama · 2026-04-28T21:04:04Z

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

claude · 2026-04-28T21:04:19Z

Claude finished @furukama's task in 2m 33s —— View job

PR Review: KISS · YAGNI · DRY · Fail Fast · Secure by Default · Performance

Reading changed files
Analyzing against KISS / YAGNI / DRY
Analyzing against Fail Fast / Secure by Default
Analyzing performance & speed
Writing review

Overall this is a solid, well-structured PR. The permissions model, test coverage, and config normalization are all good. A few issues worth addressing — one is on the hot path and should be fixed before merge; the rest are minor clean-up items.

🔴 Performance — Synchronous DB read in the write hot path

buildTrajectoryScore (skill-run-trajectories.ts:201) calls getAgentSkillScores synchronously during recordSkillRunTrajectory, which fires on every skill_run event for enabled agents. This blocks the event loop on every skill run.

// skill-run-trajectories.ts:206
const [score] = getAgentSkillScores({
  agentId: event.agent_id,
  skillName: event.skill_id,
  limit: 1,
});

The score snapshot (agent_skill) is a point-in-time read of a derived metric. It is not part of the run's causal data — the run happened before the score was updated. Options in increasing simplicity:

Preferred: Pass the score in from the call site (the observer already has it, or it can be fetched once before writing). This is an arg, not an internal query.
Acceptable: Drop agent_skill from the write-time record entirely. The score can be computed at read time via a join/lookup. The run score (scoreSkillRunOutcome) has no DB dependency and can stay.

Copilot flagged this too. Fix this →

🟡 YAGNI / KISS — `tenant_id` is a premature abstraction

In buildSkillRunTrajectoryRecord (lines 246-247), tenant_id and agent_id are always assigned the same value:

tenant_id: event.agent_id,
agent_id: event.agent_id,

There's no multi-tenancy concept in the codebase today — agents are tenants. Adding a separate tenant_id field that carries identical data ships API surface for a future that may never come. If tenant ≠ agent becomes a requirement, it can be added then via a schema v3. Until then, this violates YAGNI and doubles the lookup ambiguity in readTrajectoryTenantId.

🟡 KISS / PERF — File read in `readTrajectoryTenantId` is unnecessary

pruneExpiredSkillRunTrajectories opens every .jsonl file to read and parse the first 8192 bytes just to retrieve the tenant ID (lines 335-366). But the filename is the tenant ID — files are written as ${safeFilePart(agentId)}.jsonl (line 145). Since normalizeRetentionDaysByTenant normalizes config keys by tenantId.trim(), just storing them with a safeFilePart key in the normalized config would allow a direct filename-based lookup with zero file I/O:

// Current: reads each file's first line to get raw tenantId, then double-looks up
const fileTenantId = readTrajectoryTenantId(filePath) ?? path.basename(fileEntry.name, '.jsonl');

// Simpler: filename already contains the safe tenant ID
const fileTenantId = path.basename(fileEntry.name, '.jsonl');

The only reason to read the file is to recover the raw (pre-safeFilePart) tenant ID for config key lookup. If config normalization stores keys in safeFilePart form (a one-line change to normalizeRetentionDaysByTenant), the file reads become unnecessary. This also fixes the double-lookup in tenantRetentionDays:

// skill-run-trajectories.ts:315-317 — double lookup for same reason
const configured =
  overrides[normalizedTenantId] ??
  overrides[safeFilePart(normalizedTenantId)];  // ← only needed if keys aren't pre-normalized

Fix this →

🟡 DRY — Two nearly identical agent-ID enable-check functions

isTrajectoryCaptureEnabledForAgentId (line 114) and isTrajectoryCaptureEnabledForAgent (line 128) both trim an agent ID and check it against enabledAgentIds. The only difference is one takes config: RuntimeConfig and the other takes a pre-resolved string[]. They share no implementation. A single internal helper taking (agentId: string | null | undefined, enabledAgentIds: string[]) would serve both.

🟡 Fail Fast — Silent incomplete score write

buildTrajectoryScore (line 212) catches a DB error, logs a warning, and continues — writing a trajectory record with agent_skill: null. There's no way for the caller to distinguish a "no score yet" null from a "DB error" null. This is a soft fail that the caller treats as success.

The simplest fix: propagate the error and let recordSkillRunTrajectory's own try/catch handle it uniformly (it logs all capture failures). Alternatively, add an error boolean to SkillRunTrajectoryScore to mark degraded records. At minimum, the distinction matters for data quality analysis downstream.

🟡 KISS — Overloaded `pruneExpiredSkillRunTrajectories` input object

The function signature (line 378) accepts { config?, adaptiveSkills?, now? } where both config and adaptiveSkills are independently optional but silently dependent — config is still needed for resolveSkillRunTrajectoryStoreDir even when adaptiveSkills is provided. A caller passing only adaptiveSkills would get the correct retention logic but stale store-dir resolution from getRuntimeConfig().

The internal coupling is invisible from the call site. Consider consolidating to a single resolved-config shape passed by callers, or keep config as the only entry point and let the function extract adaptiveSkills itself.

🟢 Secure by Default — Good

Directory mode 0o700, file mode 0o600 on all trajectory writes. ✓
safeFilePart uses an allowlist pattern to prevent path injection from agent IDs. ✓
readTrajectoryTenantId validates typeof candidate === 'string' before using parsed values. ✓
readTrajectoryTenantId reads only 8192 bytes — limits exposure to large files.

One minor note: if a JSONL file's first line is > 8192 bytes, JSON.parse will throw and null is returned silently. A file in this state is not pruned by tenant policy (falls back to filename-based lookup which uses the default retention). This is safe but worth a log line at debug level so operators can detect corruption.

🟢 Test — One flaky test (Copilot already flagged)

skills-inspection.test.ts:277 — the pruning test computes date partitions using Date.now() without freezing time. Tests that span UTC midnight will produce different date strings for the write and the prune cutoff, making them flaky. Copilot's suggested vi.useFakeTimers() fix is the right call.

Summary

Area	Finding	Severity
Performance	Sync DB read (`getAgentSkillScores`) in `skill_run` event hot path	🔴 Fix before merge
YAGNI	`tenant_id === agent_id` always — premature abstraction	🟡
KISS/PERF	`readTrajectoryTenantId` opens every file in prune path; filename is sufficient	🟡
DRY	Two near-identical agent-ID enable-check functions	🟡
Fail Fast	Silent null score on DB error; no signal to caller	🟡
KISS	Overloaded `pruneExpiredSkillRunTrajectories` input object	🟡
Test	Flaky midnight race in pruning test	🟡
Security	Large first-line silently skips file during prune	🟢

feat: add trajectory retention schema

5a6d9e5

furukama marked this pull request as ready for review April 28, 2026 18:40

Copilot AI review requested due to automatic review settings April 28, 2026 18:40

Copilot started reviewing on behalf of furukama April 28, 2026 18:41 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread src/skills/skill-run-trajectories.ts Outdated

Comment thread tests/skills-inspection.test.ts Outdated

Benedikt Koehler added 4 commits April 28, 2026 23:06

fix: address trajectory review comments

6b2c9ce

refactor: dedupe trajectory agent checks

3093571

fix: omit derived agent skill trajectory score

4348eba

refactor: simplify trajectory pruning config

abe032c

furukama merged commit 4445485 into main Apr 28, 2026
5 checks passed

furukama deleted the codex/issue-488-trajectory-store branch April 28, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Trajectory store schema + retention#643

[codex] Trajectory store schema + retention#643
furukama merged 5 commits intomainfrom
codex/issue-488-trajectory-store

furukama commented Apr 28, 2026

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

furukama commented Apr 28, 2026

Summary

What changed

Validation

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: KISS · YAGNI · DRY · Fail Fast · Secure by Default · Performance

🔴 Performance — Synchronous DB read in the write hot path

🟡 YAGNI / KISS — tenant_id is a premature abstraction

🟡 KISS / PERF — File read in readTrajectoryTenantId is unnecessary

🟡 DRY — Two nearly identical agent-ID enable-check functions

🟡 Fail Fast — Silent incomplete score write

🟡 KISS — Overloaded pruneExpiredSkillRunTrajectories input object

🟢 Secure by Default — Good

🟢 Test — One flaky test (Copilot already flagged)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 28, 2026 •

edited

Loading

claude Bot commented Apr 28, 2026 •

edited

Loading

🟡 YAGNI / KISS — `tenant_id` is a premature abstraction

🟡 KISS / PERF — File read in `readTrajectoryTenantId` is unnecessary

🟡 KISS — Overloaded `pruneExpiredSkillRunTrajectories` input object