v0.6.0
What's New
1. Torch 2.4 Compatibility (#145)
MegaBlocks now supports Torch 2.4!
2. New CI/CD
MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (#131) and run tests on a GPU (#127).
3. Remove Weight Parallelism (#137)
Weight parallelism was not in use and so we removed it.
4. Shared Experts (#109)
Implement shared experts, based on the DeepSeekMoE paper.
Bug Fixes
- Better handle incompatible ffn sizes (#108)
- Fix AMP for memory optimized options (#111)
- Don't save moe lb-loss tensors (#119)
What's Changed
- Remove turbo by @dblalock in #96
- Update README.md by @dakinggg in #98
- Fix for
ffn_hidden_sizeof 128, and better error message for incompatible ffn sizes. by @snarayan21 in #108 - Add Shared Expert by @vchiley in #109
- Fix AMP for memory optimized options by @mvpatel2000 in #111
- bump and pin versions by @vchiley in #112
- dont save moe lb-loss tensors if args.moe_loss_weight=0 by @michael-go in #119
- bump by @vchiley in #116
- Minor changes to batched_load_balancing_loss function by @ShashankMosaicML in #121
- Migrate tests to pytest + add GA by @eitanturok in #127
- Change Runner in GA by @eitanturok in #129
- Clean up setup.py by @eitanturok in #128
- only run GA if repo owner is Databricks by @eitanturok in #135
- GA to Lint + Format MegaBlocks by @eitanturok in #131
- bump ci-testing to v0.1.2 by @eitanturok in #138
- remove weight parallelism by @eitanturok in #137
- refactor testing by @eitanturok in #140
- Type Checking by @eitanturok in #141
- Bump torch to <2.4.1 by @eitanturok in #145
New Contributors
- @dakinggg made their first contribution in #98
- @michael-go made their first contribution in #119
- @ShashankMosaicML made their first contribution in #121
Full Changelog: v0.5.1...v0.6.0