-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Misc] Update TokenizerLike interface and move get_cached_tokenizer
#29730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Update TokenizerLike interface and move get_cached_tokenizer
#29730
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Documentation preview: https://vllm--29730.org.readthedocs.build/en/29730/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the tokenizer handling by updating the TokenizerLike interface, moving get_cached_tokenizer, and introducing HfTokenizer. The changes improve code structure and align tokenizer behavior with Hugging Face conventions. I've found a few issues related to type correctness in the protocol and a missing parameter in the MistralTokenizer implementation that should be addressed.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…r` (vllm-project#29730) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…r` (vllm-project#29730) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Purpose
pad_token_idtoTokenizerLikeinterface to be used in Score API.__call__,encode,decodeandconvert_ids_to_tokens; apply them toMistralTokenizeras well. cc @patrickvonplatenfrom_pretrainedto be in line withTokenizerRegistry.get_tokenizer.get_cached_tokenizertovllm.tokenizers.hf(with back-compatibility)Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.