Skip to content

optimize: improve inference scripts with mmgp and torchao support#13

Open
shangvo wants to merge 2 commits intoshowlab:mainfrom
shangvo:feature/optimize-inference
Open

optimize: improve inference scripts with mmgp and torchao support#13
shangvo wants to merge 2 commits intoshowlab:mainfrom
shangvo:feature/optimize-inference

Conversation

@shangvo
Copy link

@shangvo shangvo commented Mar 12, 2025

Optimization for Inference Scripts

Changes

  • Added inference_mmgp.py for memory optimization using mmgp
  • Added inference_torchao.py for model quantization and acceleration
  • Updated requirements with new dependencies:
    • para-attn: for first block caching
    • mmgp: for memory management
    • torchao: for model quantization

Benefits

  • Reduced memory usage with mmgp profile management
  • Accelerated inference with model quantization
  • Added support for different hardware configurations

Dependencies

Added new requirements in requirements_new.txt:

  • para-attn
  • mmgp
  • torchao

Please make sure to install these new dependencies before running the optimized inference scripts.
reference_test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant