Skip to content

[Refactor] Generalize magi_compile for non-module callables and simplify internals#17

Open
cennn wants to merge 1 commit intoSandAI-org:mainfrom
cennn:chore/perf-entrypoints-threshold-tuning-v2
Open

[Refactor] Generalize magi_compile for non-module callables and simplify internals#17
cennn wants to merge 1 commit intoSandAI-org:mainfrom
cennn:chore/perf-entrypoints-threshold-tuning-v2

Conversation

@cennn
Copy link
Copy Markdown
Collaborator

@cennn cennn commented Apr 3, 2026

🗂️ PR Category

  • ✨ New Feature
  • 🚀 Optimization (performance, memory, etc.)
  • 💥 Breaking Change
  • 🐛 Bug Fix
  • 🛠️ Development / Refactoring
  • 📚 Documentation
  • 🧹 Chore (Dependencies, CI/CD, Configuration, etc.)
  • 🧪 Testing

📝 Description

Non-module callable support

magi_compile now works uniformly on any callable class/instance, not just nn.Module:

  • Class decoration hooks __init__ (same mechanism for Module and non-module).
  • Instance decoration patches the target method on the instance directly.
  • Bound method decoration compiles on __self__.
  • New method_name parameter allows explicit method selection (defaults to forward).

AOT instance compilation fix

Previously, AOT mode for instance-level compilation (magi_compile(model)) had issues with dispatch_to_compiled_fwd — the old code special-cased nn.Module with isinstance checks and _magi_original_forward hacks. This is replaced by a clean two-category dispatch:

  • Method path (target_method_name is set): swap target_function.__code__ and bind to instance.
  • Function path: swap inspect.unwrap(obj).__code__ directly.

Both JIT and AOT use the same dispatch_to_compiled_fwd with a single dispatch_via_method flag — no more isinstance(obj, nn.Module) branches.

Internal simplification

  • Remove patch.object during first JIT compilation (unnecessary — Dynamo traces compiled_entry bytecode directly, not instance.__dict__).
  • Remove _state_dispatches_via_method, _magi_compile_instance (merged into _magi_compile_bound_method).
  • Flatten dispatch_to_compiled_fwd: 4 sub-context-managers inlined into one.
  • Remove redundant owner traversal in _mark_static_shapes (Dynamo handles self).
  • Rename internal attrs for clarity: _target_callableoriginal_entry, _compiled_callablecompiled_entry, compiled_codejit_compiled_code, original_code_objectoriginal_code_for_hook.
  • Centralize state attr names via get_attr_name_for_* helpers.

Tests

  • Add non-module class/instance/method perf tests across all 3 suites (MLP, NormResidual, Pointwise).
  • Use cuda_benchmark in API timing tests for stability.
  • Lighten timing test model (dim=128×4 layers, was 256×8) for faster CI.

- Unify api.py dispatch: consolidate method_name / owner_cls / owner_obj
  inference, remove __call__ fallback, simplify entry kind identification
- Remove unnecessary patch.object during first JIT compilation
- Remove redundant _state_dispatches_via_method and dispatch_via_method
- Remove redundant callable check in MagiCompileState.__init__
- Flatten dispatch_to_compiled_fwd: inline 4 sub-context-managers into one
- Remove owner traversal in _mark_static_shapes (Dynamo handles self)
- Simplify _lazy_init_magi_state: single obj param, unified model_tag
- Rename internal state attrs: _target_callable -> original_entry,
  _compiled_callable -> compiled_entry
- Lighten timing test model: dim=128 x 4 layers (was 256 x 8)
- Use cuda_benchmark in api_tests for stable timing
- Add non-module class/instance/method perf tests across all suites
@cennn cennn changed the title [Refactor] Simplify magi_compile internals and add non-module perf tests [Refactor] Generalize magi_compile for non-module callables and simplify internals Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant