-
Notifications
You must be signed in to change notification settings - Fork 517
Open
Description
Description
When running Devstral-Small-2 (or any model using Mistral's tekken v13 tokenizer) via mlx_lm.server, the generated output contains raw BPE byte-fallback characters (Ġ, U+0120) instead of spaces.
Reproduction
python -m mlx_lm.server --model mlx-community/Devstral-Small-2-24B-Instruct-2512-bf16
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Devstral-Small-2-24B-Instruct-2512-bf16",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "hello"}
]
}'Expected output:
Hello! How can I assist you today?
Actual output:
Hello!ĠHowĠcanĠIĠassistĠyouĠtoday?
Environment
- mlx-lm: 0.31.2
- mlx: 0.31.1
- transformers: 5.3.0
- Model:
mlx-community/Devstral-Small-2-24B-Instruct-2512-bf16
Root cause
Devstral uses Mistral's tekken v13 tokenizer format (tekken.json) rather than the standard HuggingFace tokenizer. The NaiveStreamingDetokenizer does not handle this format correctly, emitting raw BPE space-prefix tokens (Ġ) instead of converting them to spaces.
PR #363 previously proposed a MistralStreamingDetokenizer with native tekken support but was closed without being merged.
Notes
- The issue is reproducible with both
mlx_lm.serverand third-party servers that use mlx-lm under the hood - GGUF variants of the same model are unaffected (the issue is specific to the MLX tokenizer path)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels