fix: resolve abnormal stop and add tool parser in GLM-4.7 by lkevincc0 · Pull Request #640 · ztxz16/fastllm

lkevincc0 · 2025-12-31T02:54:34Z

为 GLM-4.7 添加专用 Tool Parser
修复GLM-4.7非正常停止的bug

注意：本次修改未更新 abstract_tool_parser.py 中的 get_tool_parser_auto 自动选择逻辑。因此，对于 glm4_moe 模型类型，默认仍为 glm45。使用GLM-4.7 必须显式指定 glm47 才能获得正确的工具调用解析。

工具调用

测试

import json
from openai import OpenAI

client = OpenAI(api_key="sk-...", base_url="http://0.0.0.0:8000/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get the current stock price",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {"type": "string", "description": "The stock symbol, e.g. AAPL"},
            },
            "required": ["symbol"]
        }
    }
}]

messages = [{"role": "user", "content": "What's the price of Apple stock?"}]

response = client.chat.completions.create(
    model="llm",
    messages=messages,
    tools=tools,
    stream=True,
)

# Initialize streaming collection variables
reasoning_content = ""
content = ""
final_tool_calls = {}
reasoning_started = False
content_started = False
# Process streaming response
for chunk in response:
    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta

    # Streaming reasoning process output
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        if not reasoning_started and delta.reasoning_content.strip():
            print("\n🧠 Thinking Process:")
            reasoning_started = True
        reasoning_content += delta.reasoning_content
        print(delta.reasoning_content, end="", flush=True)

    # Streaming answer content output
    if hasattr(delta, 'content') and delta.content:
        if not content_started and delta.content.strip():
            print("\n\n💬 Answer Content:")
            content_started = True
        content += delta.content
        print(delta.content, end="", flush=True)

    # Streaming tool call information (parameter concatenation)
    if delta.tool_calls:
        for tool_call in delta.tool_calls:
            idx = tool_call.index
            if idx not in final_tool_calls:
                final_tool_calls[idx] = tool_call
                final_tool_calls[idx].function.arguments = tool_call.function.arguments
            else:
                final_tool_calls[idx].function.arguments += tool_call.function.arguments

# Output final tool call information
if final_tool_calls:
    print("\n📋 Function Calls Triggered:")
    for idx, tool_call in final_tool_calls.items():
        print(f"  {idx}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}")

修改前

启动命令：

python -m ftllm.cli serve \
    /mnt/models/glm-4-7-awq \
    --device cuda \
    --moe_device "{'cuda':5,'cpu':15}" \
    --enable_amx True \
    --host 0.0.0.0 \
    --port 8000 \
    --model_name llm \
    --think True \
    --tool_call_parser glm45 \
    --cuda_embedding \
    --cache_fast True \
    --cuda_shared_expert True

输出：

💬 Answer Content:
<think>
The user is asking for the price of Apple stock. I need to use the get_stock_price function to get this information. Apple's stock symbol is AAPL, which is well-known. I have all the required parameters to make this function call.</think>

修改后

启动命令：

python -m ftllm.cli serve \
    /mnt/models/glm-4-7-awq \
    --device cuda \
    --moe_device "{'cuda':5,'cpu':15}" \
    --enable_amx True \
    --host 0.0.0.0 \
    --port 8000 \
    --model_name llm \
    --think True \
    --tool_call_parser glm47 \
    --cuda_embedding \
    --cache_fast True \
    --cuda_shared_expert True

输出：

💬 Answer Content:
<think>
The user is asking for the price of Apple stock. I have a function called "get_stock_price" that can get the current stock price. The function requires a "symbol" parameter, which should be the stock symbol. For Apple, the stock symbol is commonly known as "AAPL". I have all the required information to make this function call.

Let me call the get_stock_price function with symbol "AAPL".</think>I'll get the current stock price for Apple (AAPL) for you.
📋 Function Calls Triggered:
  0: Function Name: get_stock_price, Parameters: {"symbol": "AAPL"}

意外停止

符号!在GLM4.7的token id是0导致意外停止

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.7", trust_remote_code=True)

tokens = [18, 47524, 115937, 20, 0, 92]

print("=" * 50)
print("Token 分析:")
print("=" * 50)
for i, t in enumerate(tokens):
    decoded = tokenizer.decode([t])
    print(f"  Token[{i}] ID={t:6d} -> '{decoded}' (repr: {repr(decoded)})")

print("\n特殊 Token 信息:")
print(f"  eos_token_id: {tokenizer.eos_token_id}")
print(f"  pad_token_id: {tokenizer.pad_token_id if hasattr(tokenizer, 'pad_token_id') else 'N/A'}")
print(f"  bos_token_id: {tokenizer.bos_token_id if hasattr(tokenizer, 'bos_token_id') else 'N/A'}")

# 检查 token 0 是否是特殊 token
print(f"\nToken 0 解码结果: '{tokenizer.decode([0])}'")

# 检查 eos_token_ids (如果是列表)
if hasattr(tokenizer, 'eos_token_id'):
    eos = tokenizer.eos_token_id
    if isinstance(eos, list):
        print(f"\nEOS token IDs (多个): {eos}")
        for e in eos:
            print(f"  {e} -> '{tokenizer.decode([e])}'")

==================================================
Token 分析:
==================================================
  Token[0] ID=    18 -> '3' (repr: '3')
  Token[1] ID= 47524 -> '^{' (repr: '^{')
  Token[2] ID=115937 -> '202' (repr: '202')
  Token[3] ID=    20 -> '5' (repr: '5')
  Token[4] ID=     0 -> '!' (repr: '!')
  Token[5] ID=    92 -> '}' (repr: '}')

特殊 Token 信息:
  eos_token_id: 151329
  pad_token_id: 151329
  bos_token_id: None

Token 0 解码结果: '!'

测试

from openai import OpenAI
client = OpenAI(
	api_key="sk-",
	base_url="http://0.0.0.0:8000/v1",
)
MODEL = "llm"

q = r"""
Please reason step by step, and put your final answer within \\boxed{}.
Let $n \geq 6$ be a positive integer. We call a positive integer $n$-Norwegian if it has three distinct positive divisors whose sum is equal to $n$. Let $f(n)$ denote the smallest $n$-Norwegian positive integer. Let $M=3^{2025!}$ and for a non-negative integer $c$ define 
\begin{equation*}
    g(c)=\frac{1}{2025!}\left\lfloor \frac{2025! f(M+c)}{M}\right\rfloor.
\end{equation*}
We can write 
\begin{equation*}
    g(0)+g(4M)+g(1848374)+g(10162574)+g(265710644)+g(44636594)=\frac{p}{q}
\end{equation*}
where $p$ and $q$ are coprime positive integers. What is the remainder when $p+q$ is divided by $99991$?
"""

response = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": q}],
    stream=True,
    max_tokens=2000
)
for chunk in response:
	if not chunk.choices:
		continue
	delta = chunk.choices[0].delta
	if hasattr(delta, 'content') and delta.content:
		print(delta.content, end="", flush=True)

<think>
We need to solve a complex problem. Let's parse it carefully.

We have n ≥ 6 positive integer. We call a positive integer "n-Norwegian" if it has three distinct positive divisors whose sum is equal to n.

Define f(n) as the smallest n-Norwegian positive integer.

Then M = 3^{2025

<think>
We are given a problem with many steps. Let's parse and understand it.

We have n ≥ 6 positive integer. A positive integer is called n-Norwegian if it has three distinct positive divisors whose sum is equal to n. f(n) is the smallest n-Norwegian positive integer.

Define M = 3^{2025

不同prompt的output:

<think>
The user wants me to repeat the string "5!" five times.

1.  **Identify the string to repeat:** "5!"
2.  **Identify the number of repetitions:** 5
3.  **Determine the output format:** Usually, repetition implies concatenation (e.g., "5

修改后

<think>
We need to understand the problem. Let's parse.

We have n ≥ 6 a positive integer. A positive integer m (maybe they say positive integer n-Norwegian if it has three distinct positive divisors whose sum equals n). So for a given n (≥6), we say a positive integer k is n-Norwegian if there exist three distinct positive divisors of k (i.e., three distinct numbers that divide k) such that their sum equals n.

Define f(n) = smallest n-Norwegian positive integer.

Now M = 3^{2025!}. For a non-negative integer c, define:

g(c) = (1/2025!) * floor( (2025! * f(M + c)) / M ).

We then consider sum: g(0) + g(4M) + g(1848374) + g(10162574) + g(265710644) + g(44636594) = p/q in lowest terms. Compute p+q mod 99991.

We need to compute that.

First, understand f(n). For given n, we want the smallest positive integer k such that there exist three distinct positive divisors
[...]

C++支持Qwen3的模板

Improve scaling calculation in fastllm-cuda.cu and bug fix

fix: 数组格式eos_token_id被错误解析为0导致token 0触发EOS

Copilot

Pull request overview

This PR adds a specialized tool parser for the GLM-4.7 MOE model, based on the vLLM project implementation. The main changes include a new Python tool parser class that extends the existing GLM-4.5 parser with updated regex patterns, and C++ modifications to handle EOS tokens when they're specified as arrays.

Key changes:

Adds Glm47MoeModelToolParser class with GLM-4.7-specific regex patterns for parsing tool calls
Updates C++ code to properly handle array-format eos_token_id configurations
Initializes bos_token_id and eos_token_id with default values of -1

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
tools/fastllm_pytools/openai_server/tool_parsers/glm47_moe_tool_parser.py	New tool parser for GLM-4.7 that overrides regex patterns from GLM-4.5 parser to handle different tool call format
tools/fastllm_pytools/openai_server/tool_parsers/init.py	Adds import and export of the new Glm47MoeModelToolParser class
src/model.cpp	Adds logic to parse array-format eos_token_id and adds explanatory comments
include/models/basellm.h	Initializes bos_token_id and eos_token_id to -1 as default values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-03T10:32:13Z

                    for (auto &it : ids.array_items()) {
                        this->eos_token_ids.insert(it.int_value());
                    }
+                    // Don't set eos_token_id integer - leave it as -1


When eos_token_id is in array format, the code intentionally leaves eos_token_id as -1 (based on the default value from basellm.h). However, this causes a problem in ResetLogitsOfEOS functions (lines 1402, 1418, 1440, 1457 in basellm.cpp) where the code unconditionally accesses logit[this->eos_token_id]. When eos_token_id is -1, this creates an invalid array access with a negative index.

The code should either:

Skip adding eos_token_id to the list when it's -1, or

Only process eos_token_ids set in this case

This is a critical bug that will cause undefined behavior or crashes when using models with array-format eos_token_id.

Suggested change

for (auto &it : ids.array_items()) {

this->eos_token_ids.insert(it.int_value());

}

// Don't set eos_token_id integer - leave it as -1

bool firstSet = false;

for (auto &it : ids.array_items()) {

int id = it.int_value();

this->eos_token_ids.insert(id);

// Also set eos_token_id to a valid value (first element) to avoid invalid index usage elsewhere

if (!firstSet) {

this->eos_token_id = id;

firstSet = true;

}

}

黄宇扬 and others added 30 commits May 12, 2025 16:44

fix

1cab6f8

V 0.0.1.1

a83fd52

readme里修改一些命令

b3c8ce7

更新README

879536e

尝试修复低版本GCC/MSVC编译

be59281

增加Tokenizers的单元测试

c21f26d

修复多个elseif逻辑，修复phi3

d2f5664

支持切片操作反向、大于小于，支持Qwen3的模板推理

fe32a1a

修复Tokenizer对特殊符号的规范化逻辑，修复phi3，glm4

3221845

增加AVX512VNNI加速

27e7be3

multicuda mlp改写为更通用的形式，支持fp8并行

a43c2a0

修复混合张量并行的一些bug；混合张量并行支持所有精度

d8e7307

删去无用的代码

73ff7a5

修复moe的混合张量并行

0628cda

moe混合张量并行加速

b1a6727

暂时注释掉AVX_VNNI代码（速度不稳定）

56946f4

优化多numa

778461b

启动avx512vnni

69311bf

moe cpu和numa略微加速

150b613

fix

849ffe2

Merge pull request ztxz16#521 from TylunasLi/develop

58f06f7

C++支持Qwen3的模板

Merge branch 'master' of https://www.github.com/ztxz16/fastllm

dc69586

fix

952c987

修复arm下的编译问题

53b4fbc

fix

0393315

ds默认top_k改成10

6618a3b

V 0.0.1.2

96fe234

update wechat

44d29e0

路径修复

adbed7b

hot fix

b1ab76f

黄宇扬 and others added 23 commits November 18, 2025 16:04

numas支持int8

866a8f0

add deepseek_v32

56ecef7

修改allocate兼容老gcc

a39a631

fix

bbc5c81

v0.1.5.0

f1e0fc1

fix

4452bec

Merge pull request ztxz16#627 from lovedheart/patch-1

fd7fa68

Improve scaling calculation in fastllm-cuda.cu and bug fix

fix

ac4f802

修复纯cpu跑next模型出错的bug

cf366b3

cpu permuteself加速

e063bc1

numa分配失败时增加提示

4ab5f05

修复arm上的编译（gguf暂不支持）

8782ae8

Merge branch 'master' of https://www.github.com/ztxz16/fastllm

a4f917b

增加一些AMX的准备代码

3dd6694

初步支持AMX

972a30d

v1.5.1

78d1732

fix compile for amx

59883bb

fix

da39153

fix

930ce8b

revert

73d6d8c

Add GLM-4.7 MOE tool parser and update __init__ for tool parsers

f82395e

fix: 数组格式eos_token_id被错误解析为0导致token 0触发EOS

e7ceafa

Merge pull request #1 from lkevincc0/glm47-stop-fix

47fb36a

fix: 数组格式eos_token_id被错误解析为0导致token 0触发EOS

Copilot AI review requested due to automatic review settings January 3, 2026 10:29

Copilot started reviewing on behalf of lkevincc0 January 3, 2026 10:29 View session

Copilot AI reviewed Jan 3, 2026

View reviewed changes

lkevincc0 changed the title ~~feat: Add GLM-4.7 MOE tool parser~~ fix: resolve abnormal stop and add tool parser in GLM-4.7 Jan 3, 2026

ztxz16 force-pushed the master branch 3 times, most recently from e1b5c57 to dce3e6c Compare March 2, 2026 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve abnormal stop and add tool parser in GLM-4.7#640

fix: resolve abnormal stop and add tool parser in GLM-4.7#640
lkevincc0 wants to merge 1254 commits intoztxz16:masterfrom
lkevincc0:master

lkevincc0 commented Dec 31, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

-                    for (auto &it : ids.array_items()) {
-                        this->eos_token_ids.insert(it.int_value());
-                    }
-                    // Don't set eos_token_id integer - leave it as -1
+                    bool firstSet = false;
+                    for (auto &it : ids.array_items()) {
+                        int id = it.int_value();
+                        this->eos_token_ids.insert(id);
+                        // Also set eos_token_id to a valid value (first element) to avoid invalid index usage elsewhere
+                        if (!firstSet) {
+                            this->eos_token_id = id;
+                            firstSet = true;
+                        }
+                    }

Conversation

lkevincc0 commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

工具调用

测试

修改前

修改后

意外停止

测试

修改后

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

lkevincc0 commented Dec 31, 2025 •

edited

Loading