feat: MLX-native training stack and upstream MythosConfig compatibility#66
Open
santome5954-lang wants to merge 2 commits intokyegomez:mainfrom
Open
feat: MLX-native training stack and upstream MythosConfig compatibility#66santome5954-lang wants to merge 2 commits intokyegomez:mainfrom
santome5954-lang wants to merge 2 commits intokyegomez:mainfrom
Conversation
…t checkpoint - open_mythos/main.py: add 5 config-compatibility fields to MythosConfig (n_kv_heads, act_threshold, lora_rank, max_output_tokens, dropout) with backward-compatible defaults; fixes variants.py TypeError - pyproject.toml: replace incorrect torch dep with actual runtime deps (mlx, numpy, loguru, transformers) - README.md: document 1b-mythos best checkpoint (step_060000, loss 1.0225) MLX architecture and train.py VARIANTS unchanged; step_060000.npz verified loadable after config update. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- train.py: full MLX training script with cosine LR schedule and checkpoint resumption - eval_inference.py / eval.py: inference evaluation across 4 checkpoints × 3 prompts - prepare_data.py: GPT-2 tokenized dataset builder for .npy training format - engine/mlx_engine.py: low-level MLX token generation engine - serve.py / mcp_server.py / open_mythos/mcp_server.py: local inference HTTP + MCP servers - open_mythos/full_model.py: DeepSeekV2Lite-compatible inference-only model - open_mythos/router.py: task routing helper (local vs API) - example_mlx.py / example_deepseek.py: runnable MLX usage examples - example.py: ported from torch to mlx - open_mythos/__init__.py: aligned exports with actual MLX class names - HANDOFF.md: next-session handoff with architecture, training history, and action items - docs/executive_report.html: cost-effectiveness report ($0.19 local vs $71.25 cloud) - docs/technical_report.html: full technical reference with architecture and training analysis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR contributes the full MLX (Apple Silicon) training and inference stack for OpenMythos, along with upstream
MythosConfigfield compatibility fixes. The changes enable training and running OpenMythos natively on Apple Silicon without PyTorch.open_mythos/main.py: Added 5 upstream-compatible fields toMythosConfig(n_kv_heads,act_threshold,lora_rank,max_output_tokens,dropout) with backward-compatible defaults — fixesTypeErrorwhen loadingvariants.pyconfigspyproject.toml: Replaced incorrecttorch = "*"dependency with actual MLX runtime (mlx >= 0.16,transformers >= 4.40)README.md: Added trained checkpoint table (1b-mythos, step_060000, loss 1.0225)train.py: Full MLX training loop with cosine LR scheduling, checkpoint resumption, warmup, and multi-phase fine-tuning supportprepare_data.py: GPT-2 tokenized dataset builder producing.npytraining formateval_inference.py/eval.py: Inference evaluation across checkpoints with perplexity measurementengine/mlx_engine.py: Low-level MLX token generation engineserve.py/mcp_server.py/open_mythos/mcp_server.py: Local inference HTTP + MCP servers for Claude Code integrationopen_mythos/full_model.py: DeepSeekV2Lite-compatible inference-only model for MLXopen_mythos/router.py: Task routing helper (local vs API)example_mlx.py/example_deepseek.py: Runnable MLX usage examplesexample.py: Ported fromtorchtomlxopen_mythos/__init__.py: Aligned exports with actual MLX class namesHANDOFF.md: Session handoff document with architecture overview, training history, and next-action recommendationsdocs/executive_report.html: Cost-effectiveness report (local MLX training: $0.19 vs cloud A100: $71.25 — 99.7% savings)docs/technical_report.html: Full technical reference covering architecture, all training phases, 2b divergence analysisTraining Results
1b-mythos trained to convergence over 60,000 steps on Apple M2 Ultra 64GB:
Best checkpoint:
ckpt/1b-mythos/step_060000.npzTest plan
python3 -c "from open_mythos.variants import mythos_1b; print(mythos_1b())"— no TypeErrorpython3 -c "from open_mythos.main import OpenMythos, MythosConfig; m = OpenMythos(MythosConfig()); print('ok')"— imports cleanlypython3 example_mlx.py— runs without errors on Apple Siliconpython3 eval_inference.py— generates samples from checkpoint (requires checkpoint file)🤖 Generated with Claude Code