Skip to content

generate a quantized onnx model #799

@qiuzhewei

Description

@qiuzhewei

Make sure you already checked the examples and documentation before submitting an issue.

How would you like to use ModelOpt

  • Hello, my purpose is to deploy quantized int8 model using tritonserver. But before that, I need to get a quantized int8 onnx model. From the documents, I did not find any guidance on how to generate onnx models after quantization. Can you provide more information on that? Thanks!

Who can help?

  • ?

System information

  • Container used (if applicable): ?
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ?
  • CPU architecture (x86_64, aarch64): ?
  • GPU name (e.g. H100, A100, L40S): ?
  • GPU memory size: ?
  • Number of GPUs: ?
  • Library versions (if applicable):
    • Python: ?
    • ModelOpt version or commit hash: ?
    • CUDA: ?
    • PyTorch: ?
    • Transformers: ?
    • TensorRT-LLM: ?
    • ONNXRuntime: ?
    • TensorRT: ?
  • Any other details that may help: ?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions