Skip to content

feat: MLX-native training stack and upstream MythosConfig compatibility#66

Open
santome5954-lang wants to merge 2 commits intokyegomez:mainfrom
santome5954-lang:claude/pensive-noether-0728cd
Open

feat: MLX-native training stack and upstream MythosConfig compatibility#66
santome5954-lang wants to merge 2 commits intokyegomez:mainfrom
santome5954-lang:claude/pensive-noether-0728cd

Conversation

@santome5954-lang
Copy link
Copy Markdown

Summary

This PR contributes the full MLX (Apple Silicon) training and inference stack for OpenMythos, along with upstream MythosConfig field compatibility fixes. The changes enable training and running OpenMythos natively on Apple Silicon without PyTorch.

  • open_mythos/main.py: Added 5 upstream-compatible fields to MythosConfig (n_kv_heads, act_threshold, lora_rank, max_output_tokens, dropout) with backward-compatible defaults — fixes TypeError when loading variants.py configs
  • pyproject.toml: Replaced incorrect torch = "*" dependency with actual MLX runtime (mlx >= 0.16, transformers >= 4.40)
  • README.md: Added trained checkpoint table (1b-mythos, step_060000, loss 1.0225)
  • train.py: Full MLX training loop with cosine LR scheduling, checkpoint resumption, warmup, and multi-phase fine-tuning support
  • prepare_data.py: GPT-2 tokenized dataset builder producing .npy training format
  • eval_inference.py / eval.py: Inference evaluation across checkpoints with perplexity measurement
  • engine/mlx_engine.py: Low-level MLX token generation engine
  • serve.py / mcp_server.py / open_mythos/mcp_server.py: Local inference HTTP + MCP servers for Claude Code integration
  • open_mythos/full_model.py: DeepSeekV2Lite-compatible inference-only model for MLX
  • open_mythos/router.py: Task routing helper (local vs API)
  • example_mlx.py / example_deepseek.py: Runnable MLX usage examples
  • example.py: Ported from torch to mlx
  • open_mythos/__init__.py: Aligned exports with actual MLX class names
  • HANDOFF.md: Session handoff document with architecture overview, training history, and next-action recommendations
  • docs/executive_report.html: Cost-effectiveness report (local MLX training: $0.19 vs cloud A100: $71.25 — 99.7% savings)
  • docs/technical_report.html: Full technical reference covering architecture, all training phases, 2b divergence analysis

Training Results

1b-mythos trained to convergence over 60,000 steps on Apple M2 Ultra 64GB:

Phase LR Steps Best Loss
M+ 1e-5 0 → 45,000 1.0960
M++ 1e-6 45,000 → 55,000 1.0462
M+++ 1e-7 55,000 → 60,000 1.0269
M4 1e-8 60,000 → 65,000 1.0225

Best checkpoint: ckpt/1b-mythos/step_060000.npz

Test plan

  • python3 -c "from open_mythos.variants import mythos_1b; print(mythos_1b())" — no TypeError
  • python3 -c "from open_mythos.main import OpenMythos, MythosConfig; m = OpenMythos(MythosConfig()); print('ok')" — imports cleanly
  • python3 example_mlx.py — runs without errors on Apple Silicon
  • python3 eval_inference.py — generates samples from checkpoint (requires checkpoint file)

🤖 Generated with Claude Code

santome5954 and others added 2 commits May 1, 2026 17:06
…t checkpoint

- open_mythos/main.py: add 5 config-compatibility fields to MythosConfig
  (n_kv_heads, act_threshold, lora_rank, max_output_tokens, dropout)
  with backward-compatible defaults; fixes variants.py TypeError
- pyproject.toml: replace incorrect torch dep with actual runtime deps
  (mlx, numpy, loguru, transformers)
- README.md: document 1b-mythos best checkpoint (step_060000, loss 1.0225)

MLX architecture and train.py VARIANTS unchanged; step_060000.npz verified
loadable after config update.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- train.py: full MLX training script with cosine LR schedule and checkpoint resumption
- eval_inference.py / eval.py: inference evaluation across 4 checkpoints × 3 prompts
- prepare_data.py: GPT-2 tokenized dataset builder for .npy training format
- engine/mlx_engine.py: low-level MLX token generation engine
- serve.py / mcp_server.py / open_mythos/mcp_server.py: local inference HTTP + MCP servers
- open_mythos/full_model.py: DeepSeekV2Lite-compatible inference-only model
- open_mythos/router.py: task routing helper (local vs API)
- example_mlx.py / example_deepseek.py: runnable MLX usage examples
- example.py: ported from torch to mlx
- open_mythos/__init__.py: aligned exports with actual MLX class names
- HANDOFF.md: next-session handoff with architecture, training history, and action items
- docs/executive_report.html: cost-effectiveness report ($0.19 local vs $71.25 cloud)
- docs/technical_report.html: full technical reference with architecture and training analysis

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants