Dream merge: dynamic platform detection + canonical MIL syntax#5
Draft
codegen-sh[bot] wants to merge 3 commits intomainfrom
Draft
Dream merge: dynamic platform detection + canonical MIL syntax#5codegen-sh[bot] wants to merge 3 commits intomainfrom
codegen-sh[bot] wants to merge 3 commits intomainfrom
Conversation
Port upstream PR #6 (imperatormk) - fixes MIL scalar type syntax from M4-only shorthand to canonical verbose format that compiles on all Apple Silicon (M1/M2/M3/M4). Changes: - program(1.3) to program(1.0), ios18 to ios16 target - Scalar type shorthand to canonical verbose format - Simplified buildInfo dict (no M4-specific version strings) - fp16 I/O fallback: g_fp16_io flag with auto-retry on compile failure for M1/M2 where cast op is unsupported - Dynamic IOSurface byte calculation (bpe: 2 for fp16, 4 for fp32) Tested on M1 Pro, macOS 26.3 (per upstream PR author).
train.m includes ane_mil_gen.h (via backward.h -> model.h) which declares extern int g_fp16_io, but train.m never defined it -- producing an undefined symbol linker error. Changes: - train.m: add g_fp16_io = 0 at file scope, wrap model_compile_kernels with auto-retry (try fp32, on fail set g_fp16_io=1, retry fp16) - model.h: compile_conv_kernel IOSurface byte calculation now uses g_fp16_io ? 2 : 4 (was hardcoded to 4) - .gitignore: add train binary + test/probe binaries
Integrates both PR #3 (M1/M2 canonical verbose MIL syntax + fp16 I/O fallback) and PR #4 (runtime chip/OS detection via ane_compat.h) into a unified solution that works everywhere AND optimizes per-platform. Changes across 16 files: - Add training/ane_compat.h: runtime platform detection library (chip family, macOS version, MIL target selection, peak TFLOPS) - Convert all 38 hardcoded program(1.0) -> program(%s) with g_ane_platform.mil_program dynamic argument - Convert all 44 hardcoded func main<ios16> -> func main<%s> with ane_mil_target() dynamic argument - Replace hardcoded 0.019 TFLOPS constant with ane_peak_tflops() - Add #include ane_compat.h and platform init to 14 consumer files - Preserve PR #3's fp16 I/O auto-retry mechanism for M1/M2 - Use canonical verbose buildInfo syntax (universal compatibility) Co-authored-by: dermitchell1993 <dmitchell1993@aliasvault.net>
929cbe2 to
ffd8272
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What This Does
Unifies PR #3 (M1/M2 canonical verbose MIL syntax + fp16 I/O fallback) with PR #4 (runtime chip/OS detection) into a single solution that works everywhere AND optimizes per-platform.
The Core Insight
PR #3 and PR #4 address orthogonal dimensions:
By combining them, we get both. No conflicts, no compromises.
Changes (16 files, +377 / -98)
New:
training/ane_compat.h— Runtime platform detection (chip family M1→M5, macOS 13→15+, peak TFLOPS, MIL target selection)Converted across 15 files:
program(1.0)→program(%s)withg_ane_platform.mil_programfunc main<ios16>→func main<%s>withane_mil_target()0.019TFLOPS →ane_peak_tflops()(chip-specific)appendString:MIL_HDR→appendFormat:MIL_HDR, g_ane_platform.mil_programPreserved:
buildInfosyntax (universal CoreML compatibility)int g_fp16_io = 0linker fix intrain.mPlatform Detection Matrix
TFLOPS (FP16 est.)
M1: 5.5 → M2: 7.9 → M3: 9.0 → M4: 15.8 → M5: 19.0 (Ultra variants 2×)
Supersedes #3 and #4.
💻 View my work • 👤 Initiated by @dermitchell1993 • About Codegen
⛔ Remove Codegen from PR • 🚫 Ban action checks