generate a quantized onnx model

Make sure you already checked the [examples](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples) and [documentation](https://nvidia.github.io/Model-Optimizer/) before submitting an issue.

## How would you like to use ModelOpt



- Hello, my purpose is to deploy quantized int8 model using tritonserver. But before that, I need to get a quantized int8 onnx model. From the documents, I did not find any guidance on how to generate onnx models after quantization. Can you provide more information on that? Thanks!

### Who can help?



- ?

## System information



- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? 
- CPU architecture (x86_64, aarch64): ?
- GPU name (e.g. H100, A100, L40S): ?
- GPU memory size: ?
- Number of GPUs: ?
- Library versions (if applicable):
  - Python: ?
  - ModelOpt version or commit hash: ?
  - CUDA: ?
  - PyTorch: ?
  - Transformers: ?
  - TensorRT-LLM: ?
  - ONNXRuntime: ?
  - TensorRT: ?
- Any other details that may help: ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate a quantized onnx model #799

How would you like to use ModelOpt

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

generate a quantized onnx model #799

Description

How would you like to use ModelOpt

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions