ChronosConfig: extendsMiniMindConfigwith lookahead, shared experts, λ1/λ2, hybrid attention paramsLookaheadRouter: lightweight dense predictor inserted after block 0, outputs [B, S, K+1, E] routing probsTemporalLocalityLoss: L_total = L_CE + λ1·L_balance + λ2·Σ‖E_t − E_{t-1}‖²ChronosMOEFeedForward: shared experts always in VRAM + soft gating fallback on cache missMLAAttention: Multi-head Latent Attention — KV cache stores latent vectors (8-16x smaller)SlidingWindowAttention: KV cache capped atwindow_sizetokens, O(1) memory per decode step- Hybrid attention: even layers → MLA, odd layers → SlidingWindow
ExpertStore: three-tier VRAM/RAM/SSD storage with dynamic pinned memory safety limitAsyncPrefetcher: background thread prefetches SSD→RAM driven by LookaheadRouter predictionsCacheManager: unified cache interface for inference engineChronosInferenceEngine: end-to-end decode loop with async prefetch + soft gatingcluster_layout.py: co-occurrence matrix + Louvain/greedy clustering for sequential SSD layoutChronosAutoTuner: extends Optuna TPE search with λ1/λ2/lookahead_steps dimensionschronos eval: Phase 1 validation — t+1/t+2 lookahead accuracy + LRU cache hit ratechronos benchmark: Phase 3 — PPL, TPS, KV cache memory, layer-wise analysischronos export: expert cluster layout generation- 8-test smoke suite (
tests/test_smoke.py) - GitHub Actions CI (Python 3.10/3.11, lint + tests)
- PyPI packaging (
pyproject.toml,MANIFEST.in)