Skip to content

[codex] Trajectory store schema + retention#643

Merged
furukama merged 5 commits intomainfrom
codex/issue-488-trajectory-store
Apr 28, 2026
Merged

[codex] Trajectory store schema + retention#643
furukama merged 5 commits intomainfrom
codex/issue-488-trajectory-store

Conversation

@furukama
Copy link
Copy Markdown
Contributor

Summary

Implements #488 by making captured skill-run trajectories schema version 2 records with first-class fields for input, output, model, tools used, outcome, and score.

What changed

  • Adds top-level trajectory record fields for input, output, model, tools_used, outcome, and score while preserving the raw skill_run event payload for compatibility.
  • Adds trajectory retention config under adaptiveSkills.trajectoryCapture with a 365-day default and per-tenant overrides via retentionDaysByTenant.
  • Runs trajectory JSONL pruning from the existing Adaptive Skills periodic maintenance path and removes empty date partitions after pruning.
  • Documents the retention policy and schema-related config in Adaptive Skills and reference docs.
  • Adds coverage for trajectory schema capture, retention compaction behavior, and runtime config normalization.

Validation

  • biome check --write src
  • biome check src
  • tsc --noEmit --noUnusedLocals --noUnusedParameters with Node 22
  • vitest run --configLoader runner --config vitest.unit.config.ts tests/skills-observation.test.ts tests/skills-inspection.test.ts with Node 22
  • vitest run --configLoader runner --config vitest.integration.config.ts tests/config-reload.integration.test.ts with Node 22

Closes #488.

@furukama furukama marked this pull request as ready for review April 28, 2026 18:40
Copilot AI review requested due to automatic review settings April 28, 2026 18:40
@furukama
Copy link
Copy Markdown
Contributor Author

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements trajectory schema v2 and configurable retention/pruning for captured skill-run trajectories, including docs and test coverage updates for the new behavior.

Changes:

  • Upgrades trajectory JSONL records to schema v2 with first-class fields (input/output/model/tools/outcome/score) while retaining the raw skill_run event payload.
  • Adds trajectory retention configuration (global + per-tenant overrides) and normalizes it during runtime config reload.
  • Integrates trajectory pruning into the periodic Adaptive Skills inspection/maintenance path and documents the new config.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/skills-observation.test.ts Updates expectations to assert schema v2 trajectory fields are captured.
tests/skills-inspection.test.ts Adds coverage for tenant-specific trajectory retention pruning and empty-partition cleanup.
tests/config-reload.integration.test.ts Adds coverage for retention config normalization on reload.
src/skills/skills-inspection.ts Runs scheduled trajectory pruning during periodic inspection.
src/skills/skill-run-trajectories.ts Implements schema v2 record building (incl. score/tools) and JSONL retention pruning by tenant.
src/skills/adaptive-skills-types.ts Extends Adaptive Skills config types with retention fields.
src/config/runtime-config.ts Adds defaults + normalization for trajectory retention settings.
docs/development/reference/configuration.md Documents trajectory retention configuration.
docs/development/extensibility/adaptive-skills.md Documents retention policy and trajectory capture config fields.
docs/content/reference/configuration.md Mirrors retention config documentation for published docs.
docs/content/extensibility/adaptive-skills.md Mirrors retention policy documentation for published docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/skills/skill-run-trajectories.ts Outdated
Comment thread tests/skills-inspection.test.ts Outdated
@furukama
Copy link
Copy Markdown
Contributor Author

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @furukama's task in 2m 33s —— View job


PR Review: KISS · YAGNI · DRY · Fail Fast · Secure by Default · Performance

  • Reading changed files
  • Analyzing against KISS / YAGNI / DRY
  • Analyzing against Fail Fast / Secure by Default
  • Analyzing performance & speed
  • Writing review

Overall this is a solid, well-structured PR. The permissions model, test coverage, and config normalization are all good. A few issues worth addressing — one is on the hot path and should be fixed before merge; the rest are minor clean-up items.


🔴 Performance — Synchronous DB read in the write hot path

buildTrajectoryScore (skill-run-trajectories.ts:201) calls getAgentSkillScores synchronously during recordSkillRunTrajectory, which fires on every skill_run event for enabled agents. This blocks the event loop on every skill run.

// skill-run-trajectories.ts:206
const [score] = getAgentSkillScores({
  agentId: event.agent_id,
  skillName: event.skill_id,
  limit: 1,
});

The score snapshot (agent_skill) is a point-in-time read of a derived metric. It is not part of the run's causal data — the run happened before the score was updated. Options in increasing simplicity:

  • Preferred: Pass the score in from the call site (the observer already has it, or it can be fetched once before writing). This is an arg, not an internal query.
  • Acceptable: Drop agent_skill from the write-time record entirely. The score can be computed at read time via a join/lookup. The run score (scoreSkillRunOutcome) has no DB dependency and can stay.

Copilot flagged this too. Fix this →


🟡 YAGNI / KISS — tenant_id is a premature abstraction

In buildSkillRunTrajectoryRecord (lines 246-247), tenant_id and agent_id are always assigned the same value:

tenant_id: event.agent_id,
agent_id: event.agent_id,

There's no multi-tenancy concept in the codebase today — agents are tenants. Adding a separate tenant_id field that carries identical data ships API surface for a future that may never come. If tenant ≠ agent becomes a requirement, it can be added then via a schema v3. Until then, this violates YAGNI and doubles the lookup ambiguity in readTrajectoryTenantId.


🟡 KISS / PERF — File read in readTrajectoryTenantId is unnecessary

pruneExpiredSkillRunTrajectories opens every .jsonl file to read and parse the first 8192 bytes just to retrieve the tenant ID (lines 335-366). But the filename is the tenant ID — files are written as ${safeFilePart(agentId)}.jsonl (line 145). Since normalizeRetentionDaysByTenant normalizes config keys by tenantId.trim(), just storing them with a safeFilePart key in the normalized config would allow a direct filename-based lookup with zero file I/O:

// Current: reads each file's first line to get raw tenantId, then double-looks up
const fileTenantId = readTrajectoryTenantId(filePath) ?? path.basename(fileEntry.name, '.jsonl');

// Simpler: filename already contains the safe tenant ID
const fileTenantId = path.basename(fileEntry.name, '.jsonl');

The only reason to read the file is to recover the raw (pre-safeFilePart) tenant ID for config key lookup. If config normalization stores keys in safeFilePart form (a one-line change to normalizeRetentionDaysByTenant), the file reads become unnecessary. This also fixes the double-lookup in tenantRetentionDays:

// skill-run-trajectories.ts:315-317 — double lookup for same reason
const configured =
  overrides[normalizedTenantId] ??
  overrides[safeFilePart(normalizedTenantId)];  // ← only needed if keys aren't pre-normalized

Fix this →


🟡 DRY — Two nearly identical agent-ID enable-check functions

isTrajectoryCaptureEnabledForAgentId (line 114) and isTrajectoryCaptureEnabledForAgent (line 128) both trim an agent ID and check it against enabledAgentIds. The only difference is one takes config: RuntimeConfig and the other takes a pre-resolved string[]. They share no implementation. A single internal helper taking (agentId: string | null | undefined, enabledAgentIds: string[]) would serve both.


🟡 Fail Fast — Silent incomplete score write

buildTrajectoryScore (line 212) catches a DB error, logs a warning, and continues — writing a trajectory record with agent_skill: null. There's no way for the caller to distinguish a "no score yet" null from a "DB error" null. This is a soft fail that the caller treats as success.

The simplest fix: propagate the error and let recordSkillRunTrajectory's own try/catch handle it uniformly (it logs all capture failures). Alternatively, add an error boolean to SkillRunTrajectoryScore to mark degraded records. At minimum, the distinction matters for data quality analysis downstream.


🟡 KISS — Overloaded pruneExpiredSkillRunTrajectories input object

The function signature (line 378) accepts { config?, adaptiveSkills?, now? } where both config and adaptiveSkills are independently optional but silently dependent — config is still needed for resolveSkillRunTrajectoryStoreDir even when adaptiveSkills is provided. A caller passing only adaptiveSkills would get the correct retention logic but stale store-dir resolution from getRuntimeConfig().

The internal coupling is invisible from the call site. Consider consolidating to a single resolved-config shape passed by callers, or keep config as the only entry point and let the function extract adaptiveSkills itself.


🟢 Secure by Default — Good

  • Directory mode 0o700, file mode 0o600 on all trajectory writes. ✓
  • safeFilePart uses an allowlist pattern to prevent path injection from agent IDs. ✓
  • readTrajectoryTenantId validates typeof candidate === 'string' before using parsed values. ✓
  • readTrajectoryTenantId reads only 8192 bytes — limits exposure to large files.

One minor note: if a JSONL file's first line is > 8192 bytes, JSON.parse will throw and null is returned silently. A file in this state is not pruned by tenant policy (falls back to filename-based lookup which uses the default retention). This is safe but worth a log line at debug level so operators can detect corruption.


🟢 Test — One flaky test (Copilot already flagged)

skills-inspection.test.ts:277 — the pruning test computes date partitions using Date.now() without freezing time. Tests that span UTC midnight will produce different date strings for the write and the prune cutoff, making them flaky. Copilot's suggested vi.useFakeTimers() fix is the right call.


Summary

Area Finding Severity
Performance Sync DB read (getAgentSkillScores) in skill_run event hot path 🔴 Fix before merge
YAGNI tenant_id === agent_id always — premature abstraction 🟡
KISS/PERF readTrajectoryTenantId opens every file in prune path; filename is sufficient 🟡
DRY Two near-identical agent-ID enable-check functions 🟡
Fail Fast Silent null score on DB error; no signal to caller 🟡
KISS Overloaded pruneExpiredSkillRunTrajectories input object 🟡
Test Flaky midnight race in pruning test 🟡
Security Large first-line silently skips file during prune 🟢

@furukama furukama merged commit 4445485 into main Apr 28, 2026
5 checks passed
@furukama furukama deleted the codex/issue-488-trajectory-store branch April 28, 2026 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Roadmap 10.2] Trajectory store schema + retention

2 participants