You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem Statement
Currently, there's no way to directly query and locate a model file on Hugging Face using only its SHA256 hash value. This creates inefficiency when managing multiple models locally and trying to determine if a specific file is already downloaded under a different filename.
Use Case
As a user who frequently works with multiple AI models:
I often have models stored locally with varying filenames
When I need to download a model, I want to verify if I already have it locally (even under a different name)
With only the SHA256 hash of a model file, I currently cannot easily find its corresponding Hugging Face URL
Example
If I have a file with SHA256 hash d99e39955c9d3d0350d8fb7c75e40c64a2b2eaeb003883d7c941fd2e8747b28c, I should be able to query the API and discover it corresponds to https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/blob/main/parakeet-tdt-0.6b-v2.nemo.
Proposed Solution
Add a new method to the HfApi class that allows lookup by SHA256, such as:
pythondef get_file_by_sha256(sha256_hash: str) -> List[FileInfo]:
"""
Returns information about files matching the provided SHA256 hash.
Args:
sha256_hash: The SHA256 hash to look up
Returns:
List of FileInfo objects containing repo_id, path, and other metadata
"""
Benefits
This feature would:
Save storage space by avoiding duplicate downloads
Improve model management workflows
Enhance model traceability across the platform
Support verification of model file integrity
Streamline sharing by allowing reference via hash rather than full paths
Implementation Notes
This could be implemented by:
Creating a SHA256-to-file mapping in the backend
Enhancing the existing search functionality to handle SHA256 queries
Adding the corresponding API endpoints
I'm happy to provide additional information or discuss this feature request further if needed.
Environment
Without this feature, I could technically use web scraping to collect all the models, then query their SHA256 hashes and cache them locally. However, this would put a significant strain on Hugging Face’s infrastructure. Additionally, I’d need to set up regular checks to fetch newly uploaded models. While this approach is feasible, it’s far from ideal.
Problem Statement
Currently, there's no way to directly query and locate a model file on Hugging Face using only its SHA256 hash value. This creates inefficiency when managing multiple models locally and trying to determine if a specific file is already downloaded under a different filename.
Use Case
As a user who frequently works with multiple AI models:
I often have models stored locally with varying filenames
When I need to download a model, I want to verify if I already have it locally (even under a different name)
With only the SHA256 hash of a model file, I currently cannot easily find its corresponding Hugging Face URL
Example
If I have a file with SHA256 hash d99e39955c9d3d0350d8fb7c75e40c64a2b2eaeb003883d7c941fd2e8747b28c, I should be able to query the API and discover it corresponds to https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/blob/main/parakeet-tdt-0.6b-v2.nemo.
Proposed Solution
Add a new method to the HfApi class that allows lookup by SHA256, such as:
pythondef get_file_by_sha256(sha256_hash: str) -> List[FileInfo]:
"""
Returns information about files matching the provided SHA256 hash.
Benefits
This feature would:
Save storage space by avoiding duplicate downloads
Improve model management workflows
Enhance model traceability across the platform
Support verification of model file integrity
Streamline sharing by allowing reference via hash rather than full paths
Implementation Notes
This could be implemented by:
Creating a SHA256-to-file mapping in the backend
Enhancing the existing search functionality to handle SHA256 queries
Adding the corresponding API endpoints
I'm happy to provide additional information or discuss this feature request further if needed.
Environment
HF Hub API version: [version you're using]
Python version: [your Python version]
The text was updated successfully, but these errors were encountered: