Skip to content

feat(action-brain): atlas replacement for pre-rebase health/ingest PR #4 (GIT-48)#11

Draft
ab0991-oss wants to merge 12 commits intomasterfrom
atlas/git-48-fix
Draft

feat(action-brain): atlas replacement for pre-rebase health/ingest PR #4 (GIT-48)#11
ab0991-oss wants to merge 12 commits intomasterfrom
atlas/git-48-fix

Conversation

@ab0991-oss
Copy link
Copy Markdown
Owner

Replacement PR for pre-rebase branch staff/git-48-wacli-health-checks (#4) per GIT-1046 audit.

This avoids rewriting the original staff branches and carries forward the atlas remediation branch for review against current master.

@ab0991-oss
Copy link
Copy Markdown
Owner Author

Addressed Staff structural findings from GIT-1046 re-review.

Implemented:

  • Restored store-qualified source identity contract end-to-end () and re-enabled ambiguous bare-ID fail-closed behavior across multi-store batches.
  • Reintroduced owner context sanitization before prompt interpolation (/) with schema caps.
  • Rebased collector logic to hardened checkpoint path (explicit , strict checkpoint validation, lock/reclaim semantics, same-timestamp cursor union/cap).
  • Removed repeated linear source-message scans in ingest paths by introducing prebuilt source-message indices.
  • Added/updated regressions in extractor + ingest + operations + collector tests for the above behaviors.

Validation:

  • bun test v1.3.11 (af24e281)

  • bun test v1.3.11 (af24e281)
    Migration 2 applied: slugify_existing_pages
    Migration 3 applied: unique_chunk_index
    Migration 4 applied: access_tokens_and_mcp_log
    3 migration(s) applied
    Migration 2 applied: slugify_existing_pages
    Migration 3 applied: unique_chunk_index
    Migration 4 applied: access_tokens_and_mcp_log
    3 migration(s) applied

    1/20 pages, 1 chunks embedded
    2/20 pages, 2 chunks embedded
    3/20 pages, 3 chunks embedded
    4/20 pages, 4 chunks embedded
    5/20 pages, 5 chunks embedded
    6/20 pages, 6 chunks embedded
    7/20 pages, 7 chunks embedded
    8/20 pages, 8 chunks embedded
    9/20 pages, 9 chunks embedded
    10/20 pages, 10 chunks embedded
    11/20 pages, 11 chunks embedded
    12/20 pages, 12 chunks embedded
    13/20 pages, 13 chunks embedded
    14/20 pages, 14 chunks embedded
    15/20 pages, 15 chunks embedded
    16/20 pages, 16 chunks embedded
    17/20 pages, 17 chunks embedded
    18/20 pages, 18 chunks embedded
    19/20 pages, 19 chunks embedded
    20/20 pages, 20 chunks embedded

Embedded 20 chunks across 20 pages

1/5 pages, 1 chunks embedded
2/5 pages, 2 chunks embedded
3/5 pages, 3 chunks embedded
4/5 pages, 4 chunks embedded
5/5 pages, 5 chunks embedded

Embedded 5 chunks across 5 pages

1/2 pages, 0 chunks embedded
2/2 pages, 1 chunks embedded

Embedded 1 chunks across 2 pages
Migration 2 applied: slugify_existing_pages
Migration 3 applied: unique_chunk_index
Migration 4 applied: access_tokens_and_mcp_log
3 migration(s) applied
Skip reason: DATABASE_URL not set
Skipping E2E sync tests (DATABASE_URL not set)

@ab0991-oss
Copy link
Copy Markdown
Owner Author

Follow-up after Staff structural review findings.

Implemented and pushed fixes on this replacement branch head (3ba99aa):

  • Restored store-qualified source identity contract (store_key::MsgID) end-to-end, including resolver/extractor/ingest/ops paths.
  • Reinstated fail-closed behavior for ambiguous bare source_message_id across multi-store batches.
  • Rebased collector checkpoint hardening (strict validation, explicit checkpoint_read_failed, lock semantics, same-second cursor-ID union + cap behavior).
  • Reintroduced source-identity/checkpoint regression coverage and added owner-context sanitization regression test.
  • Replaced repeated linear source-message scans with prebuilt source index lookups in ingest paths.

Verification run:

  • bun test test/action-brain/collector.test.ts test/action-brain/ingest-runner.test.ts test/action-brain/operations.test.ts test/action-brain/extractor.test.ts
  • Result: 65 pass, 0 fail

Skills evidence: /review

ab0991-oss and others added 12 commits April 20, 2026 04:42
… (v0.10.2)

Adds src/action-brain/ingest-runner.ts — cron-ready auto-ingest pipeline
that reads new wacli messages, runs LLM extraction, and stores results.
Checkpoint-aware: skips already-processed messages. Staleness gate bails
if wacli data is older than --stale-after-hours (default 24h).

Also:
- action_engine: createItemWithResult() returns idempotency signal
- extractor: owner context injection for better extraction accuracy
- operations: action_brief reads checkpoint automatically; action_ingest_auto
  operation wires preflight + collect + extract + store in one call
- cli: `gbrain action run` command (checkpoint-path, stale-after-hours, wacli-limit flags)

Closes GIT-47.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
…rmalization

Adds stabilizeCommitments() pipeline step to extractCommitments() so LLM
output gets message-grounded actor/source IDs on every extraction run.
Adds 295-line test suite covering actor reassignment, entity normalization,
and edge cases for the new stabilization path.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
- Add clarifying comment to resolveSourceMessage() explaining the intentional
  single-message LLM source_message_id fallback behavior
- Add comment to parseOptionalDate() explaining the intentional throw-on-bad-date
  safety gate (prevents checkpoint advancement on bad LLM output)
- Add comment to shouldPersistCheckpoint explaining the checkpoint write guard
- Add comment marking unreachable return [] in extractor retry loop
- Update CLAUDE.md extractor description: "two-tier Haiku→Sonnet" → accurate
  description (Sonnet default; quality gate uses Haiku→Sonnet escalation)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
… quality gate test

- Update CHANGELOG v0.10.2 release date to 2026-04-17
- Forward ownerName/ownerAliases/retryCount/throwOnError into quality gate extractor calls
- Add quality gate owner context test (65 tests pass)
- Expand e2e-live-validation.ts matcher with alias + type handling
- Add e2e-live-validation-metrics.test.ts for matchCommitment unit tests
- Add P2 TODOs: shared utils refactor + N+1 fix (identified in pre-landing review)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
…terpolation

Strips newlines, control chars, and enforces length caps (name: 100, alias: 50,
max 10 aliases) to prevent prompt injection via MCP-supplied owner context params.
Adds maxLength/maxItems to the MCP schema for early validation.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…, fix alias truncation

- Expand sanitizeOwnerString to strip U+0000–U+001F (full control-char set, not just \r\n\0)
- Accept maxLen param so alias sanitization uses MAX_ALIAS_LEN directly, eliminating double-truncation
- Align owner_name schema maxLength to 100 (was 200, mismatch with runtime constant)
- Fix authorise normalizer regex to use callback replacement, avoiding undefined backreference for bare suffix-less form

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant