fix: resolve abnormal stop and add tool parser in GLM-4.7#640
fix: resolve abnormal stop and add tool parser in GLM-4.7#640lkevincc0 wants to merge 1254 commits intoztxz16:masterfrom
Conversation
C++支持Qwen3的模板
Improve scaling calculation in fastllm-cuda.cu and bug fix
fix: 数组格式eos_token_id被错误解析为0导致token 0触发EOS
There was a problem hiding this comment.
Pull request overview
This PR adds a specialized tool parser for the GLM-4.7 MOE model, based on the vLLM project implementation. The main changes include a new Python tool parser class that extends the existing GLM-4.5 parser with updated regex patterns, and C++ modifications to handle EOS tokens when they're specified as arrays.
Key changes:
- Adds
Glm47MoeModelToolParserclass with GLM-4.7-specific regex patterns for parsing tool calls - Updates C++ code to properly handle array-format
eos_token_idconfigurations - Initializes
bos_token_idandeos_token_idwith default values of -1
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tools/fastllm_pytools/openai_server/tool_parsers/glm47_moe_tool_parser.py | New tool parser for GLM-4.7 that overrides regex patterns from GLM-4.5 parser to handle different tool call format |
| tools/fastllm_pytools/openai_server/tool_parsers/init.py | Adds import and export of the new Glm47MoeModelToolParser class |
| src/model.cpp | Adds logic to parse array-format eos_token_id and adds explanatory comments |
| include/models/basellm.h | Initializes bos_token_id and eos_token_id to -1 as default values |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (auto &it : ids.array_items()) { | ||
| this->eos_token_ids.insert(it.int_value()); | ||
| } | ||
| // Don't set eos_token_id integer - leave it as -1 |
There was a problem hiding this comment.
When eos_token_id is in array format, the code intentionally leaves eos_token_id as -1 (based on the default value from basellm.h). However, this causes a problem in ResetLogitsOfEOS functions (lines 1402, 1418, 1440, 1457 in basellm.cpp) where the code unconditionally accesses logit[this->eos_token_id]. When eos_token_id is -1, this creates an invalid array access with a negative index.
The code should either:
- Skip adding eos_token_id to the list when it's -1, or
- Only process eos_token_ids set in this case
This is a critical bug that will cause undefined behavior or crashes when using models with array-format eos_token_id.
| for (auto &it : ids.array_items()) { | |
| this->eos_token_ids.insert(it.int_value()); | |
| } | |
| // Don't set eos_token_id integer - leave it as -1 | |
| bool firstSet = false; | |
| for (auto &it : ids.array_items()) { | |
| int id = it.int_value(); | |
| this->eos_token_ids.insert(id); | |
| // Also set eos_token_id to a valid value (first element) to avoid invalid index usage elsewhere | |
| if (!firstSet) { | |
| this->eos_token_id = id; | |
| firstSet = true; | |
| } | |
| } |
e1b5c57 to
dce3e6c
Compare
参考实现:glm47_moe_tool_parser.py
工具调用
测试
修改前
启动命令:
python -m ftllm.cli serve \ /mnt/models/glm-4-7-awq \ --device cuda \ --moe_device "{'cuda':5,'cpu':15}" \ --enable_amx True \ --host 0.0.0.0 \ --port 8000 \ --model_name llm \ --think True \ --tool_call_parser glm45 \ --cuda_embedding \ --cache_fast True \ --cuda_shared_expert True输出:
修改后
启动命令:
python -m ftllm.cli serve \ /mnt/models/glm-4-7-awq \ --device cuda \ --moe_device "{'cuda':5,'cpu':15}" \ --enable_amx True \ --host 0.0.0.0 \ --port 8000 \ --model_name llm \ --think True \ --tool_call_parser glm47 \ --cuda_embedding \ --cache_fast True \ --cuda_shared_expert True输出:
意外停止
符号
!在GLM4.7的token id是0导致意外停止测试
不同prompt的output:
修改后