Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions vllm/entrypoints/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@
logger = init_logger("vllm.entrypoints.openai.api_server")

ENDPOINT_LOAD_METRICS_FORMAT_HEADER_LABEL = "endpoint-load-metrics-format"
FORCED_STOP_TOKENS = [] #可以改成 ["</s>", "\n\n", "User:", ...]

Check failure on line 134 in vllm/entrypoints/openai/api_server.py

View workflow job for this annotation

GitHub Actions / pre-commit

Need type annotation for "FORCED_STOP_TOKENS" (hint: "FORCED_STOP_TOKENS: list[<type>] = ...") [var-annotated]

Check failure on line 134 in vllm/entrypoints/openai/api_server.py

View workflow job for this annotation

GitHub Actions / pre-commit

Need type annotation for "FORCED_STOP_TOKENS" (hint: "FORCED_STOP_TOKENS: list[<type>] = ...") [var-annotated]

Check failure on line 134 in vllm/entrypoints/openai/api_server.py

View workflow job for this annotation

GitHub Actions / pre-commit

Need type annotation for "FORCED_STOP_TOKENS" (hint: "FORCED_STOP_TOKENS: list[<type>] = ...") [var-annotated]

Check failure on line 134 in vllm/entrypoints/openai/api_server.py

View workflow job for this annotation

GitHub Actions / pre-commit

Need type annotation for "FORCED_STOP_TOKENS" (hint: "FORCED_STOP_TOKENS: list[<type>] = ...") [var-annotated]

_running_tasks: set[asyncio.Task] = set()

Expand Down Expand Up @@ -752,6 +753,9 @@
message="The model does not support Chat Completions API"
)
try:
if request.stop is None and len(FORCED_STOP_TOKENS) > 0:
request.stop = FORCED_STOP_TOKENS
Comment on lines +756 to +757
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues with this logic.

First, the condition request.stop is None will only be true if the client explicitly sends "stop": null. If the stop parameter is omitted in the request, request.stop will default to [] (an empty list), and this condition will not be met, causing the feature to not work for the most common use case. To apply the default stop tokens when none are provided by the user, you should check for a falsy value (which covers both None and []).

Second, you are assigning FORCED_STOP_TOKENS directly to request.stop. Since FORCED_STOP_TOKENS is a mutable list, any subsequent modification to request.stop elsewhere in the code could unintentionally alter the global FORCED_STOP_TOKENS list, leading to unpredictable behavior across different requests. You should assign a copy of the list instead.

Suggested change
if request.stop is None and len(FORCED_STOP_TOKENS) > 0:
request.stop = FORCED_STOP_TOKENS
if not request.stop and FORCED_STOP_TOKENS:
request.stop = FORCED_STOP_TOKENS.copy()


generator = await handler.create_chat_completion(request, raw_request)
except Exception as e:
raise HTTPException(
Expand Down Expand Up @@ -2168,7 +2172,18 @@
description="vLLM OpenAI-Compatible RESTful API server."
)
parser = make_arg_parser(parser)

parser.add_argument(
"--default-stop",
type=str,
nargs="*",
default=[],
help="Default stop tokens to apply to all requests"
)
args = parser.parse_args()

FORCED_STOP_TOKENS = args.default_stop if 'args' in locals() else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The assignment to FORCED_STOP_TOKENS creates a new local variable within the run_server_worker function, shadowing the global variable of the same name. This means the default stop tokens provided via the command-line will never be applied. To fix this, you need to use the global keyword to modify the module-level variable.

Additionally, the check if 'args' in locals() is redundant because args is defined on the preceding line and will always be in the local scope.

Suggested change
FORCED_STOP_TOKENS = args.default_stop if 'args' in locals() else []
global FORCED_STOP_TOKENS
FORCED_STOP_TOKENS = args.default_stop


validate_parsed_serve_args(args)

uvloop.run(run_server(args))
Loading