-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
add default stop string #29718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add default stop string #29718
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -131,6 +131,7 @@ | |||||||
| logger = init_logger("vllm.entrypoints.openai.api_server") | ||||||||
|
|
||||||||
| ENDPOINT_LOAD_METRICS_FORMAT_HEADER_LABEL = "endpoint-load-metrics-format" | ||||||||
| FORCED_STOP_TOKENS = [] #可以改成 ["</s>", "\n\n", "User:", ...] | ||||||||
|
Check failure on line 134 in vllm/entrypoints/openai/api_server.py
|
||||||||
|
|
||||||||
| _running_tasks: set[asyncio.Task] = set() | ||||||||
|
|
||||||||
|
|
@@ -752,6 +753,9 @@ | |||||||
| message="The model does not support Chat Completions API" | ||||||||
| ) | ||||||||
| try: | ||||||||
| if request.stop is None and len(FORCED_STOP_TOKENS) > 0: | ||||||||
| request.stop = FORCED_STOP_TOKENS | ||||||||
|
|
||||||||
| generator = await handler.create_chat_completion(request, raw_request) | ||||||||
| except Exception as e: | ||||||||
| raise HTTPException( | ||||||||
|
|
@@ -2168,7 +2172,18 @@ | |||||||
| description="vLLM OpenAI-Compatible RESTful API server." | ||||||||
| ) | ||||||||
| parser = make_arg_parser(parser) | ||||||||
|
|
||||||||
| parser.add_argument( | ||||||||
| "--default-stop", | ||||||||
| type=str, | ||||||||
| nargs="*", | ||||||||
| default=[], | ||||||||
| help="Default stop tokens to apply to all requests" | ||||||||
| ) | ||||||||
| args = parser.parse_args() | ||||||||
|
|
||||||||
| FORCED_STOP_TOKENS = args.default_stop if 'args' in locals() else [] | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The assignment to Additionally, the check
Suggested change
|
||||||||
|
|
||||||||
| validate_parsed_serve_args(args) | ||||||||
|
|
||||||||
| uvloop.run(run_server(args)) | ||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two issues with this logic.
First, the condition
request.stop is Nonewill only be true if the client explicitly sends"stop": null. If thestopparameter is omitted in the request,request.stopwill default to[](an empty list), and this condition will not be met, causing the feature to not work for the most common use case. To apply the default stop tokens when none are provided by the user, you should check for a falsy value (which covers bothNoneand[]).Second, you are assigning
FORCED_STOP_TOKENSdirectly torequest.stop. SinceFORCED_STOP_TOKENSis a mutable list, any subsequent modification torequest.stopelsewhere in the code could unintentionally alter the globalFORCED_STOP_TOKENSlist, leading to unpredictable behavior across different requests. You should assign a copy of the list instead.