Skip to content

Add datatrove tag to synthetic dataset cards#473

Merged
JoelNiklaus merged 1 commit intohuggingface:mainfrom
JoelNiklaus:feat/add-datatrove-tag
Mar 16, 2026
Merged

Add datatrove tag to synthetic dataset cards#473
JoelNiklaus merged 1 commit intohuggingface:mainfrom
JoelNiklaus:feat/add-datatrove-tag

Conversation

@JoelNiklaus
Copy link
Copy Markdown
Contributor

@JoelNiklaus JoelNiklaus commented Mar 16, 2026

Problem

Datasets generated with DataTrove have no way to be discovered or filtered by the datatrove tag on the Hugging Face Hub.

Solution

Add "datatrove" as a permanent tag in the dataset card generator's tag set, alongside the existing "synthetic" tag. Every new synthetic dataset card will now include - datatrove in its YAML frontmatter tags.

Testing

Existing tests pass — none assert on specific tag values in the rendered card.

Made with Cursor


Note

Low Risk
Low risk: adds a single constant tag to the dataset card YAML frontmatter, with no changes to inference, uploads, or data processing logic.

Overview
Synthetic dataset cards generated by InferenceDatasetCardGenerator now always include the datatrove tag (in addition to synthetic) in the README YAML frontmatter, improving discoverability/filtering on the Hugging Face Hub.

Written by Cursor Bugbot for commit 5fd2d3f. Configure here.

Every dataset card generated by DataTrove will now include the
`datatrove` tag, making these datasets discoverable on the Hub.

Made-with: Cursor
@JoelNiklaus JoelNiklaus merged commit b1058f4 into huggingface:main Mar 16, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant