feat: MLX-native training stack and upstream MythosConfig compatibility by santome5954-lang · Pull Request #66 · kyegomez/OpenMythos

santome5954-lang · 2026-05-01T08:53:43Z

Summary

This PR contributes the full MLX (Apple Silicon) training and inference stack for OpenMythos, along with upstream MythosConfig field compatibility fixes. The changes enable training and running OpenMythos natively on Apple Silicon without PyTorch.

open_mythos/main.py: Added 5 upstream-compatible fields to MythosConfig (n_kv_heads, act_threshold, lora_rank, max_output_tokens, dropout) with backward-compatible defaults — fixes TypeError when loading variants.py configs
pyproject.toml: Replaced incorrect torch = "*" dependency with actual MLX runtime (mlx >= 0.16, transformers >= 4.40)
README.md: Added trained checkpoint table (1b-mythos, step_060000, loss 1.0225)
train.py: Full MLX training loop with cosine LR scheduling, checkpoint resumption, warmup, and multi-phase fine-tuning support
prepare_data.py: GPT-2 tokenized dataset builder producing .npy training format
eval_inference.py / eval.py: Inference evaluation across checkpoints with perplexity measurement
engine/mlx_engine.py: Low-level MLX token generation engine
serve.py / mcp_server.py / open_mythos/mcp_server.py: Local inference HTTP + MCP servers for Claude Code integration
open_mythos/full_model.py: DeepSeekV2Lite-compatible inference-only model for MLX
open_mythos/router.py: Task routing helper (local vs API)
example_mlx.py / example_deepseek.py: Runnable MLX usage examples
example.py: Ported from torch to mlx
open_mythos/__init__.py: Aligned exports with actual MLX class names
HANDOFF.md: Session handoff document with architecture overview, training history, and next-action recommendations
docs/executive_report.html: Cost-effectiveness report (local MLX training: $0.19 vs cloud A100: $71.25 — 99.7% savings)
docs/technical_report.html: Full technical reference covering architecture, all training phases, 2b divergence analysis

Training Results

1b-mythos trained to convergence over 60,000 steps on Apple M2 Ultra 64GB:

Phase	LR	Steps	Best Loss
M+	1e-5	0 → 45,000	1.0960
M++	1e-6	45,000 → 55,000	1.0462
M+++	1e-7	55,000 → 60,000	1.0269
M4	1e-8	60,000 → 65,000	1.0225

Best checkpoint: ckpt/1b-mythos/step_060000.npz

Test plan

python3 -c "from open_mythos.variants import mythos_1b; print(mythos_1b())" — no TypeError
python3 -c "from open_mythos.main import OpenMythos, MythosConfig; m = OpenMythos(MythosConfig()); print('ok')" — imports cleanly
python3 example_mlx.py — runs without errors on Apple Silicon
python3 eval_inference.py — generates samples from checkpoint (requires checkpoint file)

🤖 Generated with Claude Code

…t checkpoint - open_mythos/main.py: add 5 config-compatibility fields to MythosConfig (n_kv_heads, act_threshold, lora_rank, max_output_tokens, dropout) with backward-compatible defaults; fixes variants.py TypeError - pyproject.toml: replace incorrect torch dep with actual runtime deps (mlx, numpy, loguru, transformers) - README.md: document 1b-mythos best checkpoint (step_060000, loss 1.0225) MLX architecture and train.py VARIANTS unchanged; step_060000.npz verified loadable after config update. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- train.py: full MLX training script with cosine LR schedule and checkpoint resumption - eval_inference.py / eval.py: inference evaluation across 4 checkpoints × 3 prompts - prepare_data.py: GPT-2 tokenized dataset builder for .npy training format - engine/mlx_engine.py: low-level MLX token generation engine - serve.py / mcp_server.py / open_mythos/mcp_server.py: local inference HTTP + MCP servers - open_mythos/full_model.py: DeepSeekV2Lite-compatible inference-only model - open_mythos/router.py: task routing helper (local vs API) - example_mlx.py / example_deepseek.py: runnable MLX usage examples - example.py: ported from torch to mlx - open_mythos/__init__.py: aligned exports with actual MLX class names - HANDOFF.md: next-session handoff with architecture, training history, and action items - docs/executive_report.html: cost-effectiveness report ($0.19 local vs $71.25 cloud) - docs/technical_report.html: full technical reference with architecture and training analysis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

santome5954 and others added 2 commits May 1, 2026 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MLX-native training stack and upstream MythosConfig compatibility#66

feat: MLX-native training stack and upstream MythosConfig compatibility#66
santome5954-lang wants to merge 2 commits intokyegomez:mainfrom
santome5954-lang:claude/pensive-noether-0728cd

santome5954-lang commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

santome5954-lang commented May 1, 2026

Summary

Training Results

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants