Skip to content

Conversation

@michaelfeil
Copy link
Contributor

What does this PR do?

Wanted to submit this pr for a long time. The tokenizer heuristic is incorrectly tuned, and its a bad idea to spawn as many tokenizers as CPU cores.

  • if you have only 4 cores, leave 1 for the candle backend. usually the cores will clogg up, and we want to keep resources for launching cuda kernels.
  • usually, you can see that running a inference benchmark (e.g. https://huggingface.co/TaylorAI/bge-micro the smallest model i know), that you dont need more than 2 tokenizer threads. For https://huggingface.co/TaylorAI/bge-micro, and multiple raw C clients + localhost, I can see benefits of having up to 6 tokenizers on a full H100 GPU.. Even 16 is overkill - regardless of how many cpu cores you have.
  • spawning 208 CPU threads takes around ~16s of cold-start time. I cap it to 64.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
  • Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@michaelfeil michaelfeil changed the title Tokenization workers heuristic. feat: better Tokenization # workers heuristic Nov 25, 2025
@alvarobartt alvarobartt self-requested a review November 25, 2025 12:24
@alvarobartt alvarobartt added this to the v1.9.0 milestone Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants