Skip to content

[Enhancement] Five internal/fetcher fetchers and markdown_extraction have no tests #25

@wiebe-vandendriessche

Description

@wiebe-vandendriessche

Problem or motivation

internal/fetcher/ has test files only for ModelAPIFetcher and ModelReadmeFetcher. The following have no tests at all:

  • DatasetAPIFetcher — fetches dataset metadata from the HF Hub API
  • DatasetReadmeFetcher — fetches and parses dataset README/model-card YAML
  • DatasetTreeFetcher — fetches the dataset file tree with security metadata (cursor-based pagination)
  • ModelTreeFetcher — fetches the model file tree (same pagination logic, different endpoint)
  • ModelSearcher — searches for models on the HF Hub
  • markdown_extraction.go — shared front-matter YAML splitting and string extraction helpers used by all readme fetchers

ModelTreeFetcher and DatasetTreeFetcher in particular implement cursor-based pagination (maxTreePages = 10) with non-trivial state. Errors in pagination would silently truncate security scan data.

Proposed solution

Add test files using httptest.NewServer (the same pattern already used in model_api_fetcher_test.go) for each untested fetcher. Cover: successful fetch, 404 response (HFError wrapping), pagination across two pages for tree fetchers, and empty result sets. For markdown_extraction.go, add table-driven tests for splitFrontMatter with missing delimiters, invalid YAML, and empty input.

Alternatives considered

None — httptest-based tests are already established in this package and straightforward to extend.

Additional context

Affected files: dataset_api_fetcher.go, dataset_readme_fetcher.go, dataset_tree_fetcher.go, model_tree_fetcher.go, model_search.go, markdown_extraction.go.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttestsWrite tests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions