Skip to content

feat: resolve HF Hub cache layout in model directory scanning#466

Open
AlexWorland wants to merge 6 commits intojundot:mainfrom
AlexWorland:feature/hf-cache-discovery
Open

feat: resolve HF Hub cache layout in model directory scanning#466
AlexWorland wants to merge 6 commits intojundot:mainfrom
AlexWorland:feature/hf-cache-discovery

Conversation

@AlexWorland
Copy link
Copy Markdown

Summary

Models downloaded via huggingface-cli or huggingface_hub use an indirect cache layout (models--Org--Name/refs/main → commit hash → snapshots/<hash>/) instead of a flat directory. When a user adds their HF cache directory (e.g., ~/.cache/huggingface/hub) to model directories in settings, the scanner didn't understand this indirection and found nothing.

This change teaches the existing two-level scanner to resolve HF Hub cache entries to their active snapshot. No directories are scanned automatically — users opt in by adding their HF cache path to model directories. The resolved snapshot flows through the same _is_model_dir()_register_model() pipeline as all other models.

Changes

File Change
omlx/model_discovery.py Add _resolve_hf_cache_entry() — resolves models--Org--Name/ to snapshots/<hash>/ via refs/main; integrate into discover_models() between Level 1 and Level 2 checks
omlx/admin/routes.py Add HF cache resolution to list_hf_models() dashboard endpoint; extract duplicated dedupe/size/append logic into _add_model() helper
tests/test_model_discovery.py 13 new tests — 6 unit tests for _resolve_hf_cache_entry() edge cases, 7 integration tests for discover_models() with HF cache layouts

How it works

discover_models(model_dir):
  for each subdir:
    1. Is it an adapter?           → skip
    2. Has config.json?            → Level 1: register directly
    3. Is it models--Org--Name/?   → NEW: resolve snapshot, register if valid model
    4. Otherwise                   → Level 2: scan as org folder

Existing flat and org-nested layouts are completely unaffected — the HF cache check only fires for directories matching the models--*--* naming pattern, and the continue ensures it doesn't fall through to the org scan.

Testing

tests/test_model_discovery.py    88 passed (75 existing + 13 new)

New test coverage:

  • _resolve_hf_cache_entry(): valid entry, regular dir, missing org separator, missing refs/main, missing snapshot, whitespace stripping
  • discover_models(): single/multiple HF cache models, model_path points to snapshot, missing config.json skipped, mixed flat+HF cache, mixed org+HF cache, no fallthrough to org scan

AlexWorland and others added 6 commits March 29, 2026 14:10
Resolve models stored in HuggingFace Hub's cache format
(models--Org--Name/snapshots/<hash>/) so they appear in both
the model discovery engine and the dashboard model list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 new tests covering _resolve_hf_cache_entry() edge cases and
discover_models() integration with HF cache directory layouts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove redundant is_file() check before read_text() — the try/except
OSError already handles missing refs/main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match existing test file conventions — no other test class uses
inline section dividers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deduplicate the bare HF cache directory setup (entry + refs + snapshot)
into a shared helper. _make_hf_cache_model now calls through to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant