Skip to content

Model File Lookup by SHA256 Hash #3069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cgf120 opened this issue May 9, 2025 · 1 comment
Open

Model File Lookup by SHA256 Hash #3069

cgf120 opened this issue May 9, 2025 · 1 comment

Comments

@cgf120
Copy link

cgf120 commented May 9, 2025

Problem Statement
Currently, there's no way to directly query and locate a model file on Hugging Face using only its SHA256 hash value. This creates inefficiency when managing multiple models locally and trying to determine if a specific file is already downloaded under a different filename.
Use Case
As a user who frequently works with multiple AI models:

I often have models stored locally with varying filenames
When I need to download a model, I want to verify if I already have it locally (even under a different name)
With only the SHA256 hash of a model file, I currently cannot easily find its corresponding Hugging Face URL

Example
If I have a file with SHA256 hash d99e39955c9d3d0350d8fb7c75e40c64a2b2eaeb003883d7c941fd2e8747b28c, I should be able to query the API and discover it corresponds to https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/blob/main/parakeet-tdt-0.6b-v2.nemo.
Proposed Solution
Add a new method to the HfApi class that allows lookup by SHA256, such as:
pythondef get_file_by_sha256(sha256_hash: str) -> List[FileInfo]:
"""
Returns information about files matching the provided SHA256 hash.

Args:
    sha256_hash: The SHA256 hash to look up
    
Returns:
    List of FileInfo objects containing repo_id, path, and other metadata
"""

Benefits
This feature would:

Save storage space by avoiding duplicate downloads
Improve model management workflows
Enhance model traceability across the platform
Support verification of model file integrity
Streamline sharing by allowing reference via hash rather than full paths

Implementation Notes
This could be implemented by:

Creating a SHA256-to-file mapping in the backend
Enhancing the existing search functionality to handle SHA256 queries
Adding the corresponding API endpoints

I'm happy to provide additional information or discuss this feature request further if needed.
Environment

HF Hub API version: [version you're using]
Python version: [your Python version]

@cgf120
Copy link
Author

cgf120 commented May 9, 2025

Without this feature, I could technically use web scraping to collect all the models, then query their SHA256 hashes and cache them locally. However, this would put a significant strain on Hugging Face’s infrastructure. Additionally, I’d need to set up regular checks to fetch newly uploaded models. While this approach is feasible, it’s far from ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant