Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

fyuan1316 · 2025-02-04T15:01:51Z

Proposal: Add Default torch_dtype="auto" to `model_kwargs` in HuggingFace Runtime for Optimized Model Weight Precision

Description

Currently, the model_kwargs field in the HuggingFaceSettings class is defined as Optional[dict] with a default value of None. This lack of specific default values may lead to suboptimal configurations, particularly when it comes to the data precision of model weights during loading and inference.

To address this issue, I suggest modifying the HuggingFaceSettings class to include a default value for torch_dtype in model_kwargs. Specifically, we suggest setting torch_dtype="auto" as the default behavior. This change will allow the HuggingFace runtime to automatically select the most appropriate data type for model weights based on the available hardware (e.g., CPU, GPU, or TPU) and model architecture.

Proposed Changes

I suggest adding a model_validator method to the HuggingFaceSettings class to ensure that model_kwargs is initialized with a default value of {"torch_dtype": "auto"} if not explicitly provided.

Benefits

Improved Data Precision: By setting torch_dtype="auto" as the default, the HuggingFace runtime can automatically select the optimal data type for model weights based on the hardware and model architecture. This can lead to better performance and memory utilization.
Simplified Configuration: Users no longer need to explicitly set torch_dtype in model_kwargs if they want to use the default behavior. This reduces configuration complexity and minimizes the risk of errors.
Consistency: Providing a default value ensures that all instances of HuggingFaceSettings have a consistent and predictable configuration, making the code more robust and easier to maintain.

Potential Drawbacks

Compatibility Issues: Some models or use cases may not be compatible with torch_dtype="auto" ? In such cases, users can manually override the default value by explicitly setting torch_dtype in model_kwargs.
Performance Overhead: Automatically selecting the data type may introduce a small performance overhead during model initialization. However, this overhead is expected to be negligible compared to the benefits of improved data precision.

The text was updated successfully, but these errors were encountered:

fyuan1316 · 2025-02-04T15:14:02Z

While I was pinpointing another issue #2023 , I discovered this problem. By optimizing the model weights, the GPU memory usage can be significantly reduced. In the case of Phi-2, selecting an appropriate data precision can cut the memory usage in half.

fyuan1316 linked a pull request Feb 4, 2025 that will close this issue

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2047

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

fyuan1316 commented Feb 4, 2025

fyuan1316 commented Feb 4, 2025

Uh oh!

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

Comments

fyuan1316 commented Feb 4, 2025

Proposal: Add Default torch_dtype="auto" to model_kwargs in HuggingFace Runtime for Optimized Model Weight Precision

Description

Proposed Changes

Benefits

Potential Drawbacks

fyuan1316 commented Feb 4, 2025

Uh oh!

Proposal: Add Default torch_dtype="auto" to `model_kwargs` in HuggingFace Runtime for Optimized Model Weight Precision