This document defines class-based model profiles for abyss-stack.
It does not name one true vendor or one permanent model brand. It defines the infra-facing profile classes the stack should be able to host.
For family- or variant-specific operating notes, use MODEL_CARDS.
abyss-stack owns runtime profile posture, not agent-layer tier meaning.
The stack should answer:
- what class of model is being hosted
- what latency and memory posture it implies
- what context budget posture it expects
- where the profile belongs in local storage and serving policy
Use for:
- fast routing
- quick structure checks
- lightweight archive or summary passes
Expected posture:
- lowest latency budget
- smallest VRAM and RAM expectation
- short working context
Use for:
- ordinary planning and execution
- default bounded task work
- most multi-step local routes
Expected posture:
- moderate latency budget
- moderate context budget
- best balance for steady local operation
Use for:
- synthesis
- contradiction arbitration
- expensive reasoning passes
- rare high-cost judgment
Expected posture:
- highest latency budget
- strongest resource requirement
- should be invoked selectively rather than by default
Use for:
- distillation
- summary packs
- entity and decision extraction
- writeback preparation
Expected posture:
- optimized for structured output and compression fidelity
- does not need to be the deepest model
- may prefer throughput stability over depth
Each runtime profile should be able to name:
profile_classlatency_budgetcontext_budget_classstorage_tierserving_pathquantization_or_runtime_variant
Keep the split explicit:
MODEL_PROFILESsays what class of runtime posture a lane belongs toMODEL_CARDSsays which concrete family or variant currently fits that lane- promotion still belongs to machine-fit, pilot, and benchmark surfaces
The stack should keep heavier profiles explicit about storage placement.
Preferred rule:
- fast and frequently used assets stay on the active runtime root when practical
- colder and heavier assets may live on the mounted heavy-data tier when available
- path policy must remain explicit so
/abyssabsence does not silently spill onto the system disk
- do not publish model brands as doctrine
- do not encode human role meaning here
- do not turn runtime profiles into routing truth
- do not treat one quantization as permanent canon
Return posture should also remain class-based.
spark should use the thinnest anchor-only rebuild, workhorse should be the default checkpoint-first return class, deep may allow richer selective recall, and archive should stay summary-first rather than becoming generic continuation.