Skip to content

Fix check_dataset entry point by adding missing main function#471

Merged
JoelNiklaus merged 1 commit intohuggingface:mainfrom
JoelNiklaus:fix/check-dataset-entry-point
Mar 11, 2026
Merged

Fix check_dataset entry point by adding missing main function#471
JoelNiklaus merged 1 commit intohuggingface:mainfrom
JoelNiklaus:fix/check-dataset-entry-point

Conversation

@JoelNiklaus
Copy link
Copy Markdown
Contributor

@JoelNiklaus JoelNiklaus commented Mar 11, 2026

Problem

The check_dataset CLI tool fails with an ImportError when invoked:

ImportError: cannot import name 'main' from 'datatrove.tools.check_dataset'

Root cause

The pyproject.toml entry point declares check_dataset = "datatrove.tools.check_dataset:main", but the module had no main function — the CLI logic lived directly under if __name__ == "__main__":.

Solution

Wrap the CLI logic in a main() function and call it from the if __name__ == "__main__": block, matching the pattern used by all other tools in datatrove/tools/.

Testing

  • Verified all other tools (merge_stats, launch_pickled_pipeline, failed_logs, inspect_data, jobs_status, track_jobs) already follow this pattern.
  • Reinstalled with uv pip install -e . and confirmed check_dataset no longer raises ImportError.

Made with Cursor


Note

Low Risk
Low risk change that only restructures the check_dataset CLI entry point to match the declared pyproject.toml script and avoid an ImportError.

Overview
Fixes the check_dataset CLI script entry point by introducing a main() function and invoking it from the __main__ guard.

This aligns src/datatrove/tools/check_dataset.py with the pyproject.toml console script (datatrove.tools.check_dataset:main) so check_dataset can be executed without import-time failures.

Written by Cursor Bugbot for commit 0f1d83d. Configure here.

@JoelNiklaus JoelNiklaus merged commit 01dbaf3 into huggingface:main Mar 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant