Skip to content

How to convert FP16 or FP8 model files to NVFP4 model files #770

@Alan-D-Chen

Description

@Alan-D-Chen

Hello, first of all, I would like to express my gratitude for your tremendous contributions to the field of LLM, which have driven significant advancements in the TensorRT-LLM framework. I am a computer science student who is using the TensorRT-LLM framework for the first time. I have some questions (regarding how to convert FP16 or FP8 model files to NVFP4 model files) that I would like to ask all of you.
First, I have read the following information:
For the basic environment, I have already created a container using the relevant command. (I learned that the conversion can be accomplished using https://github.com/NVIDIA/Model-Optimizer or the TensorRT-LLM framework at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags?version=1.2.0rc2.post1.)

docker run -it --name tensor-llm-alanchen --ipc host --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -v /data2:/data2 -v /data3:/data3 -v /data4:/data4 nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4

Could you please tell me the specific steps to convert FP16 or FP8 model files to NVFP4 model files? Alternatively, are there any easy-to-read tutorials that I can refer to?

Metadata

Metadata

Assignees

Labels

questionHelp is is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions