feat: M2 backward compatibility — dynamic ANE platform detection#2
feat: M2 backward compatibility — dynamic ANE platform detection#2codegen-sh[bot] wants to merge 1 commit intomainfrom
Conversation
Replace all hardcoded M4-only MIL program versions, iOS targets, and TFLOPS values with runtime-detected equivalents across 12 training files and 23 MIL programs. Key changes: - Add ane_compat.h: runtime chip detection (M1→M4+), per-chip MIL version/target selection, and dynamic peak TFLOPS lookup - Convert all 23 inline MIL generators from hardcoded program(1.3) / func main<ios18> to dynamic program(%s) / func main<%s> with g_ane_platform.mil_program and ane_mil_target() format args - Empty BuildInfo version strings (populated by MIL compiler at build) - Replace hardcoded 15.8 TFLOPS divisors in tiny_train.m and train_large.m with ane_peak_tflops() calls - Add ane_detect_platform() + ane_print_platform() init in all 12 executable main() functions - Update ane_mil_gen.h and stories_mil.h/stories_config.h helper headers to use dynamic platform values - Normalize train_large.m line endings (CRLF→LF) Supports: M1, M1 Pro/Max/Ultra, M2, M2 Pro/Max/Ultra, M3, M3 Pro/Max, M4, M4 Pro/Max, and future Apple Silicon via safe fallback defaults. Co-authored-by: dermitchell1993 <dmitchell1993@aliasvault.net>
|
Overall, this PR looks solid! You've added robust platform detection for ANE compatibility, which should make the code more portable across Apple Silicon generations. The structure is clean, and the estimates for TFLOPS seem reasonable based on known hardware specs. A few suggestions:
Great work on keeping it backward compatible while supporting newer MIL versions! If these tweaks make sense, we can iterate. |
| @@ -27,8 +28,8 @@ | |||
|
|
|||
| NSString *genMIL(int ch, int sp, int depth) { | |||
| NSMutableString *m = [NSMutableString string]; | |||
There was a problem hiding this comment.
Nice addition! Calling detection early ensures the platform info is available throughout the program. Consider wrapping this in a conditional if you want to make printing optional (e.g., for production builds).
| case ANE_CHIP_M4_MAX: return 15.8; | ||
| case ANE_CHIP_M4_ULTRA: return 31.6; | ||
| case ANE_CHIP_M5: return 19.0; // 38 TOPS / 2 (estimate) | ||
| case ANE_CHIP_M5_PRO: return 19.0; |
There was a problem hiding this comment.
These TFLOPS estimates are helpful for performance calculations. Source for the values? Might be good to add a comment referencing Apple's specs or measurement methodology for transparency.
|
Overall, this PR looks solid. It adds necessary platform detection for ANE compatibility across different Apple Silicon chips and macOS versions. The code is well-structured and integrates cleanly with existing files. Suggestions: Consider adding more robust error handling if platform detection fails, and document the TFLOPS estimates sources. |
| case ANE_CHIP_M3_PRO: return 9.0; | ||
| case ANE_CHIP_M3_MAX: return 9.0; | ||
| case ANE_CHIP_M3_ULTRA: return 18.0; | ||
| case ANE_CHIP_M4: return 15.8; // Empirically measured in this project |
There was a problem hiding this comment.
Good to have empirical measurement for M4. For future chips like M5, since it's an estimate, perhaps add a comment noting that and suggesting update when hardware is available.
Full Review — All 18 Changed Files Inspected🔴 Bug #1 (Compile-Breaking):
|
| } | ||
|
|
||
| // Print detected platform info (call after ane_detect_platform) | ||
| static void ane_print_platform(const ANEPlatform *p) { |
There was a problem hiding this comment.
🔴 Compile-breaking bug: This function takes const ANEPlatform *p but every single caller across all 14 .m files calls ane_print_platform() with zero arguments. This won't compile.\n\nRecommended fix — change the signature to use the global directly:\nc\nstatic void ane_print_platform(void) {\n const ANEPlatform *p = &g_ane_platform;\n\nThis matches how ane_mil_target() and ane_peak_tflops() already work (no parameter, use the global).
Summary
Replaces all hardcoded M4-only MIL program versions, iOS targets, and TFLOPS values with runtime-detected equivalents, enabling this project to run on M1, M2, M3, M4, and future Apple Silicon chips.
What changed
New file
training/ane_compat.h— Runtime chip detection header providing:ane_detect_platform()/ane_print_platform()— detect and log ANE hardwareane_mil_target()— returns correct iOS target string per chip generationane_peak_tflops()— returns chip-specific peak TFLOPS (e.g., 7.9 for M2, 15.8 for M4)g_ane_platformglobal withmil_program,chip_name,peak_tflops12 training files updated
test_perf_stats.mtest_qos_sweep.mtest_weight_reload.mtest_ane_advanced.mtest_conv_attn3.mtest_fused_bwd.mtest_fused_qkv.mtest_ane_causal_attn.mtest_ane_sdpa5.mtest_full_fused.mtiny_train.mtiny_train_old.mtrain_large.mHelper headers updated
ane_mil_gen.h— MIL generator uses dynamic platform valuesstories_mil.h/stories_config.h— dynamic platform integrationTransformation pattern (applied to all 23 MIL programs)
program(1.3)→program(%s)withg_ane_platform.mil_programfunc main<ios18>→func main<%s>withane_mil_target()15.8TFLOPS →ane_peak_tflops()dynamic callsVerification
Chip support matrix
💻 View my work • 👤 Initiated by @dermitchell1993 • About Codegen
⛔ Remove Codegen from PR • 🚫 Ban action checks