Skip to content

Dream merge: dynamic platform detection + canonical MIL syntax#5

Draft
codegen-sh[bot] wants to merge 3 commits intomainfrom
codegen-bot/dream-merge-compat-detect-b4e9c2
Draft

Dream merge: dynamic platform detection + canonical MIL syntax#5
codegen-sh[bot] wants to merge 3 commits intomainfrom
codegen-bot/dream-merge-compat-detect-b4e9c2

Conversation

@codegen-sh
Copy link

@codegen-sh codegen-sh bot commented Mar 4, 2026

What This Does

Unifies PR #3 (M1/M2 canonical verbose MIL syntax + fp16 I/O fallback) with PR #4 (runtime chip/OS detection) into a single solution that works everywhere AND optimizes per-platform.

The Core Insight

PR #3 and PR #4 address orthogonal dimensions:

By combining them, we get both. No conflicts, no compromises.

Changes (16 files, +377 / -98)

New:

  • training/ane_compat.h — Runtime platform detection (chip family M1→M5, macOS 13→15+, peak TFLOPS, MIL target selection)

Converted across 15 files:

  • 38× program(1.0)program(%s) with g_ane_platform.mil_program
  • 44× func main<ios16>func main<%s> with ane_mil_target()
  • 1× hardcoded 0.019 TFLOPS → ane_peak_tflops() (chip-specific)
  • appendString:MIL_HDRappendFormat:MIL_HDR, g_ane_platform.mil_program

Preserved:

Platform Detection Matrix

macOS MIL Target Program Example Chips
13 (Ventura) ios16 1.0 M1, M2
14 (Sonoma) ios17 1.0 M2, M3
15+ (Sequoia) ios18 1.3 M3, M4, M5

TFLOPS (FP16 est.)

M1: 5.5 → M2: 7.9 → M3: 9.0 → M4: 15.8 → M5: 19.0 (Ultra variants 2×)

Supersedes #3 and #4.


💻 View my work • 👤 Initiated by @dermitchell1993About Codegen
⛔ Remove Codegen from PR🚫 Ban action checks

codegen-sh bot and others added 3 commits March 5, 2026 15:29
Port upstream PR #6 (imperatormk) - fixes MIL scalar type syntax
from M4-only shorthand to canonical verbose format that compiles
on all Apple Silicon (M1/M2/M3/M4).

Changes:
- program(1.3) to program(1.0), ios18 to ios16 target
- Scalar type shorthand to canonical verbose format
- Simplified buildInfo dict (no M4-specific version strings)
- fp16 I/O fallback: g_fp16_io flag with auto-retry on compile
  failure for M1/M2 where cast op is unsupported
- Dynamic IOSurface byte calculation (bpe: 2 for fp16, 4 for fp32)

Tested on M1 Pro, macOS 26.3 (per upstream PR author).
train.m includes ane_mil_gen.h (via backward.h -> model.h) which
declares extern int g_fp16_io, but train.m never defined it --
producing an undefined symbol linker error.

Changes:
- train.m: add g_fp16_io = 0 at file scope, wrap model_compile_kernels
  with auto-retry (try fp32, on fail set g_fp16_io=1, retry fp16)
- model.h: compile_conv_kernel IOSurface byte calculation now uses
  g_fp16_io ? 2 : 4 (was hardcoded to 4)
- .gitignore: add train binary + test/probe binaries
Integrates both PR #3 (M1/M2 canonical verbose MIL syntax + fp16 I/O
fallback) and PR #4 (runtime chip/OS detection via ane_compat.h) into
a unified solution that works everywhere AND optimizes per-platform.

Changes across 16 files:
- Add training/ane_compat.h: runtime platform detection library
  (chip family, macOS version, MIL target selection, peak TFLOPS)
- Convert all 38 hardcoded program(1.0) -> program(%s) with
  g_ane_platform.mil_program dynamic argument
- Convert all 44 hardcoded func main<ios16> -> func main<%s> with
  ane_mil_target() dynamic argument
- Replace hardcoded 0.019 TFLOPS constant with ane_peak_tflops()
- Add #include ane_compat.h and platform init to 14 consumer files
- Preserve PR #3's fp16 I/O auto-retry mechanism for M1/M2
- Use canonical verbose buildInfo syntax (universal compatibility)

Co-authored-by: dermitchell1993 <dmitchell1993@aliasvault.net>
@codegen-sh codegen-sh bot force-pushed the codegen-bot/dream-merge-compat-detect-b4e9c2 branch from 929cbe2 to ffd8272 Compare March 5, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant