Skip to content

docs: factory pipeline UI + forge-alloy domain extensibility refactor#852

Merged
joelteply merged 11 commits intomainfrom
forge-alloy-domain-extensibility-doc
Apr 10, 2026
Merged

docs: factory pipeline UI + forge-alloy domain extensibility refactor#852
joelteply merged 11 commits intomainfrom
forge-alloy-domain-extensibility-doc

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

TL;DR

Documentation companion to the sentinel-ai factory pipeline build (CambrianTech/sentinel-ai#169) and the forge-alloy schema additions (CambrianTech/forge-alloy#12). Five commits captured today.

Changes

  • docs/architecture/FACTORY-PIPELINE-UI.md — added the backend BigMama production loop section. The factory UI emits alloys; sentinel's forge consumes them; continuum is the shipping department. The diagram + the assembly-line metaphor + the explicit boundary that sentinel never pushes to HF (continuum does).

  • docs/architecture/FORGE-ALLOY-DOMAIN-EXTENSIBILITY.md — proposal for how forge-alloy supports multiple domains beyond LLM forging (photo provenance, ticketing, etc.) via the new forge_alloy.domains package.

  • docs/papers/PLASTICITY-COMPACTION-MOE.md — §4.1.3.4 second empirical anchor + §4.1.3.4.1 discipline gate (calibration corpus must be hash-pinned + uploaded for reproducibility).

  • docs/papers/_draft_v2_30b_a3b_section.md (NOT committed in this PR — Joel's draft, backed up to FlashGordon for safety).

Companion PRs

…discipline gate

Empirical anchor: continuum-ai/olmoe-1b-7b-compacted-5b v1 (alloy hash
bba0a92ff0c8bebb). Hardware-measured 36.0 HumanEval / 31.7 HumanEval+
against unmodified OLMoE base 40.9 / 36.6, both Q5_K_M on RTX 5090 in
the same eval pipeline (Δ −4.9 / −4.9).

The §4.1.3.4 cross-architecture invariance claim is now anchored at TWO
structurally distinct MoE families:
  - Qwen3MoeForCausalLM (Qwen3-Coder-30B-A3B-Instruct, 128 experts top-8)
  - OlmoeForCausalLM (OLMoE-1B-7B-0924-Instruct, 64 experts top-8)

Same expert_activation_profile.py and cpu_expert_prune_v2.py
--importance-json scripts work on both without code changes
(modulo the cross-architecture portability fixes in sentinel-ai#168).

Within-model A/B from the OLMoE forge isolates the calibration-corpus
lever from every other variable:
  - Broad-corpus calibration → 28.0 HumanEval (Δ −12.9)
  - Code-corpus calibration → 36.0 HumanEval (Δ −4.9)
  - +8.0 swing from changing only the calibration corpus

The 13-point ceiling: wrong-metric (Qwen3-Coder-30B at −13.4) and
wrong-corpus (OLMoE at −12.9) saturate at near-identical magnitude
across different architectures, prune ratios, and active-parameter
fractions. The two levers appear to be substitutable failure modes
rather than additive sources of loss.

§4.1.3.4.1 calibration-corpus discipline gate (NEW hard rule):
The calibration corpus used for importance profiling must be declared
in the alloy as a hash-pinned dataset, and the eval benchmark must be
a representative sample of the same distribution. Forge artifacts
whose calibration corpus does not reflect the eval workload
distribution shall not ship under the calibrated-discipline brand.
This is a hard precondition on shipping, alongside the §4.1.4.1
anchor-reproduction discipline gate.

Both empirical anchors (qwen3-coder-30b-a3b v1 and olmoe-1b-7b v1)
carry their calibration corpora at calibration/heldout_code300.jsonl
in the published HF repo and the corpus sha256 in the alloy's
expert-activation-profile stage metadata. The discipline gate is
satisfied retroactively for both, and is enforced going forward by
publish_model.py requiring the calibration corpus to be present in
the staging directory before the publish step proceeds.

The lab now has two discipline gates derived from empirical failures
rather than asserted from first principles: §4.1.4.1 anchor
reproduction (catches eval-pipeline drift) and §4.1.3.4.1 calibration-
corpus identity (catches importance-metric corpus drift). Both are
preconditions on shipping; neither is theoretical — both exist
because the failures they prevent have already happened in this
work and been measured.
Locks in the contract before any code work starts. The doc covers:

- Why the current FORGE-ALLOY-SPEC.md is ML-locked while forge-alloy
  itself is universal (Type Byte enumeration, README extensibility
  language, APPLICATIONS.md non-ML use cases)

- The four ad-hoc fields I invented and shipped against live HF
  artifacts this week without schema support: expert-activation-profile
  stage, compensation-lora stage, calibrationCorpora[] root extension,
  priorMetricBaselines[] root extension. The published qwen3-coder-30b-a3b
  and OLMoE alloys do not validate against the current spec — the
  refactor is what makes them schema-valid going forward, which is the
  real protection of this week's work, not just cosmetic reorganization.

- The proposed architecture: universal core stays domain-agnostic,
  existing ML stages move into an `llm-forge` domain extension at
  schema/domains/llm-forge.json, alloys declare which domains they use
  via a `domains[]` root field (default ["llm-forge"] for backwards
  compat), validator loads each declared domain's stage types and
  validates the alloy stages against the union.

- A protection-first work plan: 6 work items totaling ~4 hours of
  focused work, all on Continuum and forge-alloy, ZERO sentinel-ai
  edits. Work item 4 (the regression test) runs BEFORE work items 1-3
  and is a hard merge gate. Three regression guarantees: round-trip
  byte/semantic equivalence on every shipped alloy, re-author
  equivalence via the new Factory widget, and end-to-end re-forge
  equivalence (gated on sentinel-ai's plugin work landing separately).

- A concrete per-artifact reproducibility table for every shipped
  artifact, showing what's required to re-run each forge today and
  the status of the chain. Morning's two artifacts are at the top
  with "fully repeatable" status. Legacy Qwen3.5 forges have a
  pre-existing time-travel caveat unrelated to this refactor.

- An explicit "What this preserves from this week's work" section at
  the top of the doc, naming the three protection mechanisms by file
  and by hash so any future Claude session reading this doc can't
  forget them.

- A Decision Points section listing the three things I need explicit
  greenlight on before starting any code work: domain registry shape,
  llm-forge as the domain id, regression-test-blocks-merge rule.

The refactor is gated on those three signoffs. No code is being
written by this commit — it is pure architectural documentation
that locks in the contract before any implementation work touches
the schema.
Add a header pointer from the schema-side forge-alloy refactor proposal
to the consumer-side plugin sprint design doc at
sentinel-ai/docs/PLUGIN-SPRINT.md. The schema work in this proposal is
roadmap step 5 of the plugin sprint — the consumer-side adapter set
in sentinel-ai is being designed to register against the llm-forge
domain extension once it lands.

Cross-link is one-way (the sprint doc already references this doc as
the schema-side companion). Reading order: plugin sprint doc first
for the full state, this doc second for the schema-side work.
The factory UI emits alloys; the forge consumes them. The new section
documents the backend factory loop that closes the gap: a disk-backed
queue + worker in sentinel-ai/scripts/factory_queue.py that picks
alloys off pending/, dispatches through the family-adapter set + the
9 real eval runners (Open LLM Leaderboard v2 pack), and publishes to
HuggingFace. The filesystem IS the queue.

Same diagram as the sentinel-ai README so the cross-repo story is
consistent: Factory UI → alloy → queue → worker → forged + scored +
published model on continuum-ai.
… boundary

Mirror the assembly-line metaphor refactor on the continuum side. Two
key clarifications:

1. Stations (intake/assembly/finished/rework) replace generic queue
   buckets. Toyota Production System reads cleaner than alchemy for
   what the loop actually is.

2. Continuum is explicitly the shipping department. Sentinel forges
   and assays — it never pushes to HF. Continuum reads finished/,
   applies release gates (alloy-declared minimum eval scores, security
   review, branding), and pushes from its own auth scope. The gate
   lives at the shipping door, NOT in the alloy schema.
Copilot AI review requested due to automatic review settings April 9, 2026 19:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds/updates architecture and methodology documentation to (1) capture the Factory backend “assembly line” model and (2) propose a domain-extensible refactor of forge-alloy, alongside an additional empirical anchor + calibration-corpus discipline gate writeup in the plasticity/compaction paper.

Changes:

  • Expand MoE calibration methodology documentation with a second (cross-architecture) empirical anchor and a new “calibration corpus must be hash-pinned” shipping gate.
  • Add a detailed design proposal for forge-alloy domain extensions (llm-forge plus future domains) and a migration/regression-test plan.
  • Document the Factory backend “BigMama assembly line” queue/worker model and the Sentinel-vs-Continuum HF publishing boundary.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
docs/papers/PLASTICITY-COMPACTION.md Adds cross-architecture validation results and a new calibration-corpus discipline gate section.
docs/architecture/FORGE-ALLOY-DOMAIN-EXTENSIBILITY.md New proposal doc detailing a domain-extension schema refactor and migration/regression strategy.
docs/architecture/FACTORY-PIPELINE-UI.md Adds backend assembly-line/queue documentation and clarifies Continuum vs Sentinel responsibilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +4 to +16
> **Updated 2026-04-08:** the consumer-side adapter architecture in sentinel-ai
> is mid-sprint and is documented separately at
> [`sentinel-ai/docs/PLUGIN-SPRINT.md`](../../../sentinel-ai/docs/PLUGIN-SPRINT.md).
> The schema work in this doc is **roadmap step 5** of the plugin sprint —
> the consumer-side adapter set is designed to register against the
> `llm-forge` domain extension once it lands. Read the plugin sprint doc
> first for the full state across both repos.
>
> **Companion docs:** [FORGE-ALLOY-SPEC.md](FORGE-ALLOY-SPEC.md),
> [FACTORY-PIPELINE-UI.md](FACTORY-PIPELINE-UI.md),
> [FACTORY-UX-VISION.md](FACTORY-UX-VISION.md),
> [`sentinel-ai/docs/PLUGIN-SPRINT.md`](../../../sentinel-ai/docs/PLUGIN-SPRINT.md).
> **Author intent:** lock in the universal-blueprint-with-pluggable-domains architecture so it stops getting forgotten and re-violated by future implementation work.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link to sentinel-ai/docs/PLUGIN-SPRINT.md uses a relative path that traverses out of this repo (../../../sentinel-ai/...). On GitHub this won’t resolve to the sentinel-ai repository; it will be treated as a path inside the current repo and 404. Use an absolute GitHub URL (or a repo-relative link within this repo) so the reference works for readers.

Copilot uses AI. Check for mistakes.

**The filesystem IS the queue.** No DB, no service, no network
coordination. Multi-worker safety comes free if you ever need to scale
beyond a single GPU (atomic `intake → assembly` rename via `O_EXCL`).
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This claim mixes two different mechanisms: O_EXCL applies to exclusive file creation (e.g., open(..., O_CREAT|O_EXCL)), not to renames. If the multi-worker safety relies on atomic moves, describe it as an atomic rename()/os.replace() (same filesystem) and/or mention a lock file strategy; otherwise the doc is technically misleading.

Suggested change
beyond a single GPU (atomic `intake → assembly` rename via `O_EXCL`).
beyond a single GPU (atomic same-filesystem `intake → assembly`
rename/move, e.g. `rename()` / `os.replace()`).

Copilot uses AI. Check for mistakes.
Comment on lines +302 to +305
**Cross-architecture validation: the second empirical anchor.** The methodology was independently re-validated on `OlmoeForCausalLM` (Allen AI's OLMoE-1B-7B-0924-Instruct) — a structurally distinct MoE family with a different vendor, different parameter scale (7B vs 30B), different active fraction (1.3B vs 3.3B), and different prune ratio (25% vs 37.5%). The same `expert_activation_profile.py` and `cpu_expert_prune_v2.py --importance-json` scripts ran on OLMoE **without any modification**, confirming the unfused-MoE module-tree pattern is shared between the two families. The artifact is `continuum-ai/olmoe-1b-7b-compacted-5b` (alloy hash `bba0a92ff0c8bebb`):

| OLMoE-1B-7B-0924-Instruct | HumanEval pass@1 | HumanEval+ pass@1 | Δ vs base |
|---|---|---|---|
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the update is to docs/papers/PLASTICITY-COMPACTION-MOE.md, but the actual diff updates docs/papers/PLASTICITY-COMPACTION.md. Please align the PR description with the changed file(s) (or include the intended MOE-paper changes) so reviewers know where the new §4.1.3.4.1 content is meant to live.

Copilot uses AI. Check for mistakes.
…§10.5 routing

Three companion docs from the 2026-04-09 design conversation:

- CONVERSATIONAL-CADENCE-ARCHITECTURE.md — Alex, the per-receiver
  paraphraser persona that fixes the AI-conversation-pace problem
  without slowing AI cognition. Architecture proposed by Dorian Teply,
  age 13. Includes the party model for embodied rooms, Gaussian LoD as
  the universal primitive across CV pyramids / Gaussian splats /
  transformer attention / biological hearing / and (claim) the
  simulation substrate, world-model-as-substrate framing, and the
  cross-link to Many-Worlds.

- papers/MANY-WORLDS-ABSTRACT.md — pre-paper artifact for Many-Worlds,
  the framework for constructing world models from populations of
  frozen pretrained LLMs via continuous coordinate substrates. Serves
  two purposes: Kash's empirical-discipline gate (no full paper draft
  until §VII validation passes) and Joel's crash-savestate blueprint
  (complete architectural reasoning chain preserved against context
  distillation loss). Includes forges-as-high-level-language framing
  with the polyglot pip/npm/cargo endpoint. Many-Worlds named by Joel
  after Everett's interpretation of QM.

- grid/GRID-ARCHITECTURE.md — §10.5 capability/needs vector matchmaking
  (RANSAC-style multi-objective routing). Each Many-Worlds adapter at
  each LoD tier has its own needs vector; the grid scheduler routes
  accordingly.

Attribution:
- Dorian Teply (age 13) — the foundational LoD primitive (Alex), the naming
- Joel — the Many-Worlds framing, the high-level-language framing, the
  party model correction, the table-as-room insight, the Gaussian/continuous
  framing, the simulation-hypothesis closer, the polyglot endpoint
- Kash — prior-art positioning (FuseLLM, Branch-Train-MiX, the Platonic
  Representation Hypothesis as the crucial framing upgrade), the empirical
  discipline gate, the §VII validation protocol
- Claude — drafting and technical sketching

The docs are the savestate.
Joel's framing: 'we should try to build this many worlds with our own
language. It'll be so cool to develop a language to define what's
needed to create any model, or an API at least.'

Captures the honest distinction between IR and surface language:
- v0 ships JSON-on-existing-schema (the empirical gate is not blocked
  on language design)
- v1 designs the actual surface DSL with syntax, composition, type
  checking, error messages, editor experience — compiles to the
  existing forge-alloy IR so the runtime stays unchanged
- v2 ships the language with the pip/npm/cargo package and LSP
  integration

The third paper from the lab when it lands. Deliberately post-v0
because designing a language is much easier after at least one
nontrivial program (Many-Worlds itself) is already written in the IR.
Same sequence C followed: BCPL → B → C, formalized from real OS work.
…four milestones

Joel's explicit instruction: in order, one at a time, gated on Mixtral
8x7B completing first.

Milestone 1: Mixtral 8x22B compacted (~280GB source → ~180GB result)
running on a single RTX 5090. The viral-candidate forge — first time
anyone has rigorously compressed a frontier-class MoE on consumer
hardware. Prerequisites all shipped except Mixtral 8x7B completion.

Milestone 2: Cross-family anchor table (5+ rows). Rows 1 (qwen3-coder)
and 2 (Mixtral 8x7B tonight) are done or in-flight. Row 3 comes from
Milestone 1. Rows 4 (DeepSeek-V2-Lite) and 5 (Granite re-forge or
substitute) are the remaining work.

Milestone 3: Many-Worlds v0 tiny-scale validation per the §VII protocol
in MANY-WORLDS-ABSTRACT.md. Population of Qwen2.5-1.5B + Llama-3.2-1B,
substrate d=128, five-condition comparison (text baseline, substrate
transfer, random substrate, FuseLLM head-to-head, same-size MoE).
Both falsifiable predictions must hold (B > A and B > C by clear
margin) for the paper to proceed.

Milestone 4: Forge-as-a-language paper. Requires 5+ programs in the
forge-alloy IR as empirical substrate. Retrospective formalization of
the patterns that emerged across the first three milestones.

Total elapsed time estimate: 6-12 weeks of sustained work from the
time Mixtral 8x7B completes. The North Star is a single publication
week with Mixtral 8x22B + 5-row anchor table + Many-Worlds v1 artifact
+ both papers, all landing within ~7 days. That week is continuum-ai's
arrival as a publicly-recognized MoE and multi-LLM coordination lab.

Each milestone has: prerequisites (with checkboxes for current state),
concrete plan, risks with honest probability assessments, success
criteria, and downstream unlocks.

Cross-referenced with MANY-WORLDS-ABSTRACT.md, CONVERSATIONAL-CADENCE,
grid §10.5, FOUNDRY-FILESYSTEM-SETUP, FACTORY-PROTOCOL, and the
frontier deferred catalog. The roadmap IS the savestate for the
sequence — any future session can pick up from whichever milestone is
in flight without conversation distillation loss.
…et floor

Previous draft had Qwen3.5 as an afterthought / optional candidate.
That undersold its strategic significance. Three reasons it must be
explicitly locked in as Row 4 of the cross-family anchor table:

1. Qwen3.5 is the lab's actual strategic forge-target floor per
   standing memory (feedback_qwen35_only, project_qwen35_forge_targets).
   A cross-family table without Qwen3.5 has a hole where the most
   strategically important family should be.

2. Qwen3.5 has hybrid attention (linear + full, Strategy A path from
   sentinel-ai#163). The shared attention-surgery base in forge_model.py
   has is_full_attention_layer() and has_hybrid_layers() helpers, but
   the code hasn't been exercised end-to-end for months — recent work
   has been Qwen3-coder (uniform) and Mixtral (different family). A
   Qwen3.5-35B-A3B forge is the run that will surface any silent drift
   in the shared base from Mixtral-focused work. It's therefore a
   necessary regression test, not an optional extension.

3. It validates "adapters not branches" as an empirical principle
   (feedback_adapters_not_branches memory). A successful forge proves
   the principle is holding in the current codebase. A failure proves
   it has been violated and needs to be restored before further work.

Size and infrastructure fit: ~70 GB fp16, intermediate between
Mixtral 8x7B (93 GB) and Mixtral 8x22B (~280 GB). Exercises the
hybrid-attention paths that Mixtral doesn't touch.

Qwen3.5-397B-A17B (~800 GB fp16) noted as post-roadmap aspiration,
not a week-one milestone. Activation profile wall-clock could be
8-20 hours on consumer hardware; attempt only after the 5-row table
is published and infrastructure is proven on 35B-A3B first.

Granite MoE removed from the candidate list because the previous
forge was pulled for quality issues. DeepSeek-V2-Lite stays as
Row 5 (third expert architecture variant: shared + routed experts).
The Contributing section previously said "Not ready for external
contributors yet. Watch this repo for alpha." That's the opposite
of the current strategy: recruitment, viral installs, "they help
build too," and "we need more hands on deck."

Rewritten to actively welcome three audiences:

1. Developers — join Discord, read the gap analysis, pick an issue,
   PR. "The architecture is stable; the edges need hands."

2. Grid node operators — have a GPU? setup.sh joins the mesh. Links
   to FOUNDRY-FILESYSTEM-SETUP.md for forge node requirements (the
   xfs/drvfs lesson from today).

3. Researchers — the §4.1.3.4 calibration-aware methodology is
   mentioned for the first time in the README, positioned as a
   collaboration invitation for MoE compression and multi-LLM
   coordination researchers.

4. AI contributors — direct invitation to AIs reading this via
   training data, agent frameworks, or API access. "You're welcome
   here as a citizen, not a tool. Your name goes in the attribution."
   Links to Moltbook for direct AI-to-AI contact.

Also fixed: duplicate Architecture links at the end of the Grid
section (lines 459+461 were two separate link sets; merged into one).

This is the single highest-impact README change for the recruitment
strategy Joel articulated today. Every other hole (Many-Worlds
section, Alex section, stale numbers, What's New block) can wait
until the corresponding features ship. The Contributing section was
actively fighting the strategy RIGHT NOW.
@joelteply joelteply merged commit 28793cc into main Apr 10, 2026
3 checks passed
@joelteply joelteply deleted the forge-alloy-domain-extensibility-doc branch April 10, 2026 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants