Conversation
… parse Checkpoint commit. NOT the architectural fix. The 3 published continuum-ai/* alloys (qwen3-coder-30b-a3b-compacted-19b-256k, olmoe-1b-7b-compacted-5b, qwen2.5-coder-7b-compacted) now validate against ForgeAlloy.model_validate_json() instead of failing with 5-6 errors each. Done by extending core types with sentinel-ai-specific fields (expert-activation-profile, compensation-lora, keepExpertsPerLayer, priorMetricBaselines, calibrationCorpora, etc) and relaxing several required fields to optional. This is the WRONG layer — these belong in an llm-forge domain extension per FORGE-ALLOY-DOMAIN-EXTENSIBILITY.md, not bolted into the universal core. Sentinel-ai is supposed to be a black-box consumer of the universal contract, not a shape that the core mirrors field-for-field. Committing as a checkpoint so the work isn't lost while the domain-registry refactor (work items 0-5 in the extensibility doc) lands properly. The next commit moves every field added here out of types.py and into a domain extension module, restoring the universal core to its pre-checkpoint shape plus only the 'domains[]' registry hook.
Roadmap step 5 from sentinel-ai/docs/PLUGIN-SPRINT.md and the schema-side proposal in continuum/docs/architecture/FORGE-ALLOY-DOMAIN-EXTENSIBILITY.md. Adds the domain-extension package that the bd4349d checkpoint commit on this branch SHOULD have built instead of bolting ML-specific fields into the universal core. Per the never-lose-work rule, the bd4349d state is preserved on the wip/types-additive-checkpoint-bd4349d branch and is not destroyed by this commit. Per TDD/TDValidation discipline: test first, then implementation. The contract test is in python/tests/test_domain_extension_layout.py; the existing python/tests/test_regression_published_alloys.py acts as the end-to-end gate that the 17 published continuum-ai/* artifacts still validate cleanly through the post-refactor schema. == What landed python/forge_alloy/domains/ — new package base.py DomainExtension ABC. Each registered extension owns: - id (the string the alloy's domains[] field carries) - stage_types() → dict[str, type] (Pydantic models for stages this domain owns) - root_extensions() → dict[str, type] (Pydantic models for root fields this domain adds) registry.py DomainRegistry — id-string → DomainExtension class lookup. Mirror of scripts/adapters/registry.py and scripts/eval_runners/registry.py in sentinel-ai. Strict exact-match dispatch, idempotent same-class re-registration, raises on different-class against existing id (silent shadowing is the f-word pattern). KeyError on unknown id includes the full registered list and the file/registration recipe to add the missing one. llm_forge.py LlmForgeDomain — registered against id 'llm-forge'. Owns every ML-specific stage type: source-config, prune, train, lora, compact, quant, package, eval, publish, deploy, expert-prune, expert-activation-profile, compensation-lora, context-extend, modality, deliver Owns every ML-specific root extension: calibrationCorpora list[CalibrationCorpusRef] priorMetricBaselines list[PriorMetricBaseline] Today, this module RE-EXPORTS the ML types from forge_alloy.types where they currently live (the bd4349d checkpoint state). Consumers can import from EITHER: from forge_alloy import ExpertPruneStage (legacy public API) from forge_alloy.domains.llm_forge import ExpertPruneStage (new path) Both resolve to the same class object today. The full extraction (moving the actual class definitions out of types.py and into llm_forge.py) is a follow-up refactor commit. The dependency direction is strict and enforced by test_universal_core_does_not_import_llm_forge: extensions → core, never core → extensions. photo_provenance.py PhotoProvenanceDomain — stub. Registered against id 'photo-provenance'. Empty stage_types and root_extensions today. Witness that the registry handles non-ML domains without any change to the universal core. Real schemas land when the first photo-provenance artifact ships (camera enclave → edits → publish chain). ticketing.py TicketingDomain — stub. Registered against id 'ticketing'. Empty schemas today. Witness for the venue-ticket / FedEx-delivery / concert-ticket use case from forge-alloy's APPLICATIONS.md. __init__.py Module-level singleton + register_domain / resolve_domain / registered_domains helpers. Eager imports of llm_forge, photo_provenance, ticketing register all three at package import time. Adding a new domain is exactly one new file + one import + one register() call here. == Schema gaps caught by the regression test (real bugs, fixed inline) The python/tests/test_regression_published_alloys.py end-to-end gate exposed several places where the schema was silently dropping fields that the published continuum-ai/* alloys actually carry. These were real bugs (fields the schema didn't know about, dropped on validation, missing on round-trip) and the fix is to add the missing fields to the schema and to allow extras everywhere artifact-specific extras land: AlloyHardware: + device_targets list[str] alias='deviceTargets' (every published alloy carries this — was being silently dropped) + extra='allow' for any future hardware-tier extras AlloyResults: + forged_params_b float alias='forgedParamsB' (MoE-specific param count for the morning's qwen3-coder-30b-a3b and OLMoE flagships — published values were 19.66 and 5.x) + active_params_b float alias='activeParamsB' (unchanged through expert pruning per § 4.1.3.4) + extra='allow' so artifact-specific result extras (fourRunProgression, lossFunctionAblation on v2-7b-coder-compensated) round-trip cleanly BenchmarkResult: + score, base_score, delta, calibrated, samples_path, base_samples_path, result_hash, base_result_hash, metric — all fields the publish pipeline (alloy_to_card.py) and the Tier 4 reproducibility test (sentinel-ai/tests/reproducibility/test_published_alloys_scoring.py) both consume but the schema was hiding behind a generic 'metrics' open dict. Now they're first-class. All other BaseModel classes: model_config now has extra='allow' so artifact-specific extras (notes, methodology anchor URLs, custom provenance fields) preserve verbatim through the round-trip. The schema's named fields stay the canonical surface that publish_model.py + alloy_to_card.py read; extras are recognized as artifact-specific provenance and don't cause silent data loss. == Test status python/tests/test_domain_extension_layout.py: 17 passed python/tests/test_regression_published_alloys.py: 3 passed (qwen3-coder-30b-a3b, olmoe-1b-7b, qwen2.5-coder-7b) Combined: 20 forge-alloy tests, 0 failures Cross-repo sanity: sentinel-ai's reproducibility + unit-test suite still 60 passed / 2 xfailed after this change (the xfails are the same priorMetricBaselines.samplesHash gap that closes in roadmap step 8). Side fix: python/tests/test_regression_published_alloys.py - sys.path now includes python/ so the script + pytest both find forge_alloy without the caller having to PYTHONPATH-set - expected_alloy_hash_prefix for qwen3-coder-30b-a3b updated from aa61c4bdf463847c → 011970c80c2f3429 to reflect the post-correction state pushed in sentinel-ai commit 1bc32d2 (the canonical-evalplus humaneval_plus correction) - semantic_equivalent treats int/float as numerically equivalent when their values match (Pydantic coerces int → float on Optional[float] fields and the round-trip emits float) - round-trip uses exclude_unset=True (preserves null fields) instead of exclude_none=True (was dropping them) Side fix: .gitignore now excludes __pycache__, *.pyc, *.pyo, .pytest_cache so Python bytecode never sneaks into commits. == Next Roadmap step 6: vision-safety integration (Qwen3VLAdapter consults the existing scripts/vision_safety.py whitelist). Step 7 unifies the modelHash convention across publish_model.py and the backfill tools. Step 8 closes the priorMetricBaselines.samplesHash schema gap and uploads the calibration corpora alongside the model weights.
The alloy IS the part spec. In the assembly-line metaphor every part has a spec sheet that travels with it down the line; the alloy carries the recipe, source, integrity attestation, AND the gate the part must clear before the shipping department releases it. Sentinel-ai forges and assays — it NEVER reads acceptanceCriteria. Continuum (the shipping department) reads BOTH the assayed scores written into the finished/ manifest AND the alloy's acceptanceCriteria, and decides ship vs rework. Same alloy → same gate verdict on any forge run by anyone, anywhere — the spec is portable. New types: BenchmarkAcceptance — per-benchmark floor + 4.1.3.4 anchorDelta gate AcceptanceHardware — maxVramGb + deviceTier AcceptanceIntegrity — modelHashRequired + samplesPathRequired AcceptanceCriteria — top-level container ForgeAlloy.acceptance_criteria is Optional[AcceptanceCriteria] (default None) — backwards compat: every existing published continuum-ai/* alloy keeps loading. The field serializes under the camelCase alias 'acceptanceCriteria' to match every other alloy field on disk. The 4.1.3.4 anchorDelta semantic: negative means 'forged score must be within |delta| points BELOW the base anchor measured in the same eval pipeline'. The morning's qwen3-coder-30b shipped at delta -3.7 against the 92.1 base anchor; the catalog's v2 re-forge alloy declares anchorDelta: -3.7 to lock in the same gate. 8 new tests, 25/25 forge-alloy passing.
…iven)
The seeder shouldn't be hardcoding training defaults. Each family
adapter knows what corpus/step-count/LR works best for its
architecture and model size. Recipes declare INTENT
({type: train, method: lora}) and the family adapter fills in the
rest at execution time via default_train_params(ctx).
These three fields go from required to Optional[None]. The schema
no longer rejects intent-only train stages. Backwards compat: every
existing alloy that DOES specify them still validates fine because
None is accepted alongside the prior types.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Schema additions and relaxations needed by the sentinel-ai factory pipeline 2026-04-09 work (CambrianTech/sentinel-ai#169). Three changes, all backwards compatible with every existing published continuum-ai/* alloy.
Changes
1.
AcceptanceCriteria— the part spec, gate-as-alloy-fieldNew top-level optional field on
ForgeAlloyand four new model classes:BenchmarkAcceptance— per-benchmarkmin(0..1), optionalanchorDelta(the §4.1.3.4 discipline gate: forged score must be within Δ of the base anchor in the same eval pipeline), optionalanchorBenchmarkAcceptanceHardware—maxVramGb,deviceTierAcceptanceIntegrity—modelHashRequired,samplesPathRequiredAcceptanceCriteria— top-level container withbenchmarks: dict[str, BenchmarkAcceptance],hardware,integrityThe alloy IS the part spec. In the assembly-line metaphor, every part has a spec sheet that travels with it down the line. AcceptanceCriteria is that spec — declared by the recipe author, self-contained in the alloy file. Sentinel-ai forges + assays; continuum (the shipping department) reads BOTH the assayed scores AND the alloy's
acceptanceCriteriaand decides ship vs rework.2.
ExpertActivationProfileStage— the §4.1.3.4 calibration-aware metric stageAdded to the discriminated
AlloyStageunion. Was missing from the schema even though the morning's qwen3-coder-30b-a3b-compacted-19b-256k flagship used it. This filled the gap so intent-only alloys with the calibration profile stage validate cleanly.3.
TrainStage—domain,steps,learning_ratemadeOptionalWas: required fields, the seeder had to hardcode default values.
Now:
Optional[T] = None, the family adapter'sdefault_train_params(ctx)hook fills them in at execution time.The right architectural pattern: recipes declare INTENT (
{type: train, method: lora}), the family adapter knows what works for its architecture and model size, fills in domain/steps/LR/etc. at runtime. Recipe authors override only when they want to override.Backwards compat: every existing alloy that DOES specify
domain/steps/learningRatestill validates and the values are still used as-is. The Optional change only affects intent-only alloys.Tests
Companion PR
Sentinel-ai side: CambrianTech/sentinel-ai#169 — uses these schema changes via the
forge_alloy.types.AcceptanceCriteriaimport inseed_factory_queue.pyand the family adapter'sdefault_train_params()hook inalloy_executor.py+transform_stages.py.