Skip to content

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fyuan1316 opened this issue Feb 4, 2025 · 1 comment · May be fixed by #2047
Open

Optimize the Default Precision of Model Weights for HuggingFace Runtime #2046

fyuan1316 opened this issue Feb 4, 2025 · 1 comment · May be fixed by #2047

Comments

@fyuan1316
Copy link

Proposal: Add Default torch_dtype="auto" to model_kwargs in HuggingFace Runtime for Optimized Model Weight Precision

Description

Currently, the model_kwargs field in the HuggingFaceSettings class is defined as Optional[dict] with a default value of None. This lack of specific default values may lead to suboptimal configurations, particularly when it comes to the data precision of model weights during loading and inference.

To address this issue, I suggest modifying the HuggingFaceSettings class to include a default value for torch_dtype in model_kwargs. Specifically, we suggest setting torch_dtype="auto" as the default behavior. This change will allow the HuggingFace runtime to automatically select the most appropriate data type for model weights based on the available hardware (e.g., CPU, GPU, or TPU) and model architecture.

Proposed Changes

I suggest adding a model_validator method to the HuggingFaceSettings class to ensure that model_kwargs is initialized with a default value of {"torch_dtype": "auto"} if not explicitly provided.

Benefits

  1. Improved Data Precision: By setting torch_dtype="auto" as the default, the HuggingFace runtime can automatically select the optimal data type for model weights based on the hardware and model architecture. This can lead to better performance and memory utilization.
  2. Simplified Configuration: Users no longer need to explicitly set torch_dtype in model_kwargs if they want to use the default behavior. This reduces configuration complexity and minimizes the risk of errors.
  3. Consistency: Providing a default value ensures that all instances of HuggingFaceSettings have a consistent and predictable configuration, making the code more robust and easier to maintain.

Potential Drawbacks

  1. Compatibility Issues: Some models or use cases may not be compatible with torch_dtype="auto" ? In such cases, users can manually override the default value by explicitly setting torch_dtype in model_kwargs.
  2. Performance Overhead: Automatically selecting the data type may introduce a small performance overhead during model initialization. However, this overhead is expected to be negligible compared to the benefits of improved data precision.
@fyuan1316
Copy link
Author

While I was pinpointing another issue #2023 , I discovered this problem. By optimizing the model weights, the GPU memory usage can be significantly reduced. In the case of Phi-2, selecting an appropriate data precision can cut the memory usage in half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant