Skip to content

fix: resolve abnormal stop and add tool parser in GLM-4.7#640

Open
lkevincc0 wants to merge 1254 commits intoztxz16:masterfrom
lkevincc0:master
Open

fix: resolve abnormal stop and add tool parser in GLM-4.7#640
lkevincc0 wants to merge 1254 commits intoztxz16:masterfrom
lkevincc0:master

Conversation

@lkevincc0
Copy link
Copy Markdown

@lkevincc0 lkevincc0 commented Dec 31, 2025

  • 为 GLM-4.7 添加专用 Tool Parser
  • 修复GLM-4.7非正常停止的bug

参考实现:glm47_moe_tool_parser.py

注意:本次修改未更新 abstract_tool_parser.py 中的 get_tool_parser_auto 自动选择逻辑。因此,对于 glm4_moe 模型类型,默认仍为 glm45。使用GLM-4.7 必须显式指定 glm47 才能获得正确的工具调用解析。


工具调用

测试

import json
from openai import OpenAI

client = OpenAI(api_key="sk-...", base_url="http://0.0.0.0:8000/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_stock_price",
        "description": "Get the current stock price",
        "parameters": {
            "type": "object",
            "properties": {
                "symbol": {"type": "string", "description": "The stock symbol, e.g. AAPL"},
            },
            "required": ["symbol"]
        }
    }
}]

messages = [{"role": "user", "content": "What's the price of Apple stock?"}]

response = client.chat.completions.create(
    model="llm",
    messages=messages,
    tools=tools,
    stream=True,
)

# Initialize streaming collection variables
reasoning_content = ""
content = ""
final_tool_calls = {}
reasoning_started = False
content_started = False
# Process streaming response
for chunk in response:
    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta

    # Streaming reasoning process output
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        if not reasoning_started and delta.reasoning_content.strip():
            print("\n🧠 Thinking Process:")
            reasoning_started = True
        reasoning_content += delta.reasoning_content
        print(delta.reasoning_content, end="", flush=True)

    # Streaming answer content output
    if hasattr(delta, 'content') and delta.content:
        if not content_started and delta.content.strip():
            print("\n\n💬 Answer Content:")
            content_started = True
        content += delta.content
        print(delta.content, end="", flush=True)

    # Streaming tool call information (parameter concatenation)
    if delta.tool_calls:
        for tool_call in delta.tool_calls:
            idx = tool_call.index
            if idx not in final_tool_calls:
                final_tool_calls[idx] = tool_call
                final_tool_calls[idx].function.arguments = tool_call.function.arguments
            else:
                final_tool_calls[idx].function.arguments += tool_call.function.arguments

# Output final tool call information
if final_tool_calls:
    print("\n📋 Function Calls Triggered:")
    for idx, tool_call in final_tool_calls.items():
        print(f"  {idx}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}")

修改前

启动命令:

python -m ftllm.cli serve \
    /mnt/models/glm-4-7-awq \
    --device cuda \
    --moe_device "{'cuda':5,'cpu':15}" \
    --enable_amx True \
    --host 0.0.0.0 \
    --port 8000 \
    --model_name llm \
    --think True \
    --tool_call_parser glm45 \
    --cuda_embedding \
    --cache_fast True \
    --cuda_shared_expert True

输出:

💬 Answer Content:
<think>
The user is asking for the price of Apple stock. I need to use the get_stock_price function to get this information. Apple's stock symbol is AAPL, which is well-known. I have all the required parameters to make this function call.</think>

修改后

启动命令:

python -m ftllm.cli serve \
    /mnt/models/glm-4-7-awq \
    --device cuda \
    --moe_device "{'cuda':5,'cpu':15}" \
    --enable_amx True \
    --host 0.0.0.0 \
    --port 8000 \
    --model_name llm \
    --think True \
    --tool_call_parser glm47 \
    --cuda_embedding \
    --cache_fast True \
    --cuda_shared_expert True

输出:

💬 Answer Content:
<think>
The user is asking for the price of Apple stock. I have a function called "get_stock_price" that can get the current stock price. The function requires a "symbol" parameter, which should be the stock symbol. For Apple, the stock symbol is commonly known as "AAPL". I have all the required information to make this function call.

Let me call the get_stock_price function with symbol "AAPL".</think>I'll get the current stock price for Apple (AAPL) for you.
📋 Function Calls Triggered:
  0: Function Name: get_stock_price, Parameters: {"symbol": "AAPL"}


意外停止

符号!在GLM4.7的token id是0导致意外停止

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.7", trust_remote_code=True)

tokens = [18, 47524, 115937, 20, 0, 92]

print("=" * 50)
print("Token 分析:")
print("=" * 50)
for i, t in enumerate(tokens):
    decoded = tokenizer.decode([t])
    print(f"  Token[{i}] ID={t:6d} -> '{decoded}' (repr: {repr(decoded)})")

print("\n特殊 Token 信息:")
print(f"  eos_token_id: {tokenizer.eos_token_id}")
print(f"  pad_token_id: {tokenizer.pad_token_id if hasattr(tokenizer, 'pad_token_id') else 'N/A'}")
print(f"  bos_token_id: {tokenizer.bos_token_id if hasattr(tokenizer, 'bos_token_id') else 'N/A'}")

# 检查 token 0 是否是特殊 token
print(f"\nToken 0 解码结果: '{tokenizer.decode([0])}'")

# 检查 eos_token_ids (如果是列表)
if hasattr(tokenizer, 'eos_token_id'):
    eos = tokenizer.eos_token_id
    if isinstance(eos, list):
        print(f"\nEOS token IDs (多个): {eos}")
        for e in eos:
            print(f"  {e} -> '{tokenizer.decode([e])}'")
==================================================
Token 分析:
==================================================
  Token[0] ID=    18 -> '3' (repr: '3')
  Token[1] ID= 47524 -> '^{' (repr: '^{')
  Token[2] ID=115937 -> '202' (repr: '202')
  Token[3] ID=    20 -> '5' (repr: '5')
  Token[4] ID=     0 -> '!' (repr: '!')
  Token[5] ID=    92 -> '}' (repr: '}')

特殊 Token 信息:
  eos_token_id: 151329
  pad_token_id: 151329
  bos_token_id: None

Token 0 解码结果: '!'

测试

from openai import OpenAI
client = OpenAI(
	api_key="sk-",
	base_url="http://0.0.0.0:8000/v1",
)
MODEL = "llm"

q = r"""
Please reason step by step, and put your final answer within \\boxed{}.
Let $n \geq 6$ be a positive integer. We call a positive integer $n$-Norwegian if it has three distinct positive divisors whose sum is equal to $n$. Let $f(n)$ denote the smallest $n$-Norwegian positive integer. Let $M=3^{2025!}$ and for a non-negative integer $c$ define 
\begin{equation*}
    g(c)=\frac{1}{2025!}\left\lfloor \frac{2025! f(M+c)}{M}\right\rfloor.
\end{equation*}
We can write 
\begin{equation*}
    g(0)+g(4M)+g(1848374)+g(10162574)+g(265710644)+g(44636594)=\frac{p}{q}
\end{equation*}
where $p$ and $q$ are coprime positive integers. What is the remainder when $p+q$ is divided by $99991$?
"""

response = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": q}],
    stream=True,
    max_tokens=2000
)
for chunk in response:
	if not chunk.choices:
		continue
	delta = chunk.choices[0].delta
	if hasattr(delta, 'content') and delta.content:
		print(delta.content, end="", flush=True)
<think>
We need to solve a complex problem. Let's parse it carefully.

We have n ≥ 6 positive integer. We call a positive integer "n-Norwegian" if it has three distinct positive divisors whose sum is equal to n.

Define f(n) as the smallest n-Norwegian positive integer.

Then M = 3^{2025
<think>
We are given a problem with many steps. Let's parse and understand it.

We have n ≥ 6 positive integer. A positive integer is called n-Norwegian if it has three distinct positive divisors whose sum is equal to n. f(n) is the smallest n-Norwegian positive integer.

Define M = 3^{2025

不同prompt的output:

<think>
The user wants me to repeat the string "5!" five times.

1.  **Identify the string to repeat:** "5!"
2.  **Identify the number of repetitions:** 5
3.  **Determine the output format:** Usually, repetition implies concatenation (e.g., "5

修改后

<think>
We need to understand the problem. Let's parse.

We have n ≥ 6 a positive integer. A positive integer m (maybe they say positive integer n-Norwegian if it has three distinct positive divisors whose sum equals n). So for a given n (≥6), we say a positive integer k is n-Norwegian if there exist three distinct positive divisors of k (i.e., three distinct numbers that divide k) such that their sum equals n.

Define f(n) = smallest n-Norwegian positive integer.

Now M = 3^{2025!}. For a non-negative integer c, define:

g(c) = (1/2025!) * floor( (2025! * f(M + c)) / M ).

We then consider sum: g(0) + g(4M) + g(1848374) + g(10162574) + g(265710644) + g(44636594) = p/q in lowest terms. Compute p+q mod 99991.

We need to compute that.

First, understand f(n). For given n, we want the smallest positive integer k such that there exist three distinct positive divisors
[...]

Copilot AI review requested due to automatic review settings January 3, 2026 10:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a specialized tool parser for the GLM-4.7 MOE model, based on the vLLM project implementation. The main changes include a new Python tool parser class that extends the existing GLM-4.5 parser with updated regex patterns, and C++ modifications to handle EOS tokens when they're specified as arrays.

Key changes:

  • Adds Glm47MoeModelToolParser class with GLM-4.7-specific regex patterns for parsing tool calls
  • Updates C++ code to properly handle array-format eos_token_id configurations
  • Initializes bos_token_id and eos_token_id with default values of -1

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tools/fastllm_pytools/openai_server/tool_parsers/glm47_moe_tool_parser.py New tool parser for GLM-4.7 that overrides regex patterns from GLM-4.5 parser to handle different tool call format
tools/fastllm_pytools/openai_server/tool_parsers/init.py Adds import and export of the new Glm47MoeModelToolParser class
src/model.cpp Adds logic to parse array-format eos_token_id and adds explanatory comments
include/models/basellm.h Initializes bos_token_id and eos_token_id to -1 as default values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/model.cpp
Comment on lines 105 to +109
for (auto &it : ids.array_items()) {
this->eos_token_ids.insert(it.int_value());
}
// Don't set eos_token_id integer - leave it as -1
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When eos_token_id is in array format, the code intentionally leaves eos_token_id as -1 (based on the default value from basellm.h). However, this causes a problem in ResetLogitsOfEOS functions (lines 1402, 1418, 1440, 1457 in basellm.cpp) where the code unconditionally accesses logit[this->eos_token_id]. When eos_token_id is -1, this creates an invalid array access with a negative index.

The code should either:

  1. Skip adding eos_token_id to the list when it's -1, or
  2. Only process eos_token_ids set in this case

This is a critical bug that will cause undefined behavior or crashes when using models with array-format eos_token_id.

Suggested change
for (auto &it : ids.array_items()) {
this->eos_token_ids.insert(it.int_value());
}
// Don't set eos_token_id integer - leave it as -1
bool firstSet = false;
for (auto &it : ids.array_items()) {
int id = it.int_value();
this->eos_token_ids.insert(id);
// Also set eos_token_id to a valid value (first element) to avoid invalid index usage elsewhere
if (!firstSet) {
this->eos_token_id = id;
firstSet = true;
}
}

Copilot uses AI. Check for mistakes.
@lkevincc0 lkevincc0 changed the title feat: Add GLM-4.7 MOE tool parser fix: resolve abnormal stop and add tool parser in GLM-4.7 Jan 3, 2026
@ztxz16 ztxz16 force-pushed the master branch 3 times, most recently from e1b5c57 to dce3e6c Compare March 2, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.