Skip to content

fix tokenizer regex issue with Mistral-based models#1049

Open
amanning3390 wants to merge 1 commit intoml-explore:mainfrom
amanning3390:fix-mistral-regex-tokenizer
Open

fix tokenizer regex issue with Mistral-based models#1049
amanning3390 wants to merge 1 commit intoml-explore:mainfrom
amanning3390:fix-mistral-regex-tokenizer

Conversation

@amanning3390
Copy link
Copy Markdown

Summary

  • Add fix_mistral_regex=True to AutoTokenizer.from_pretrained call in tokenizer_utils.py to fix incorrect regex pattern that leads to incorrect tokenization

Test plan

  • Load the affected tokenizer (e.g., Jackrong/MLX-Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2-bf16) and verify the warning no longer appears

Related

🤖 Generated with Claude Code

Add fix_mistral_regex=True to AutoTokenizer.from_pretrained to fix
incorrect regex pattern that leads to incorrect tokenization.

See: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant